mountainMath / cancensus

R wrapper for calling CensusMapper APIs
https://mountainmath.github.io/cancensus/index.html
Other
82 stars 15 forks source link

Some thoughts on function names, and possible changes #51

Closed atheriel closed 7 years ago

atheriel commented 7 years ago

The package currently uses a very unusual naming convention for functions, namely package.function_name. I haven't seen this style in R before, but R is also famous for its inconsistent naming conventions. I didn't give it much thought for a while, but as the package draws nearer to release, I thought it might be a good idea to think more carefully about naming. I think we should be motivated to have clear, unambiguous names, of course, but beyond that things are more a matter of style.

What Do Other Packages Do?

Most other R packages in my experience do not use a prefix or suffix at all (e.g. dplyr). However, I did take a look at the most recently updated 50 or so API-backed R packages, and there the naming conventions are more varied.

Probably the closest package to this one is the censusapi package, which uses a getCensus() function for querying and a listCensusMetadata() function for inspection. The junr package, which is also somewhat similar, uses get_index(), get_data() and list_titles(), which I think are too generic -- I would prefer to stick "census" in there somewhere. (I think the dwapi package also suffers from this problem).

Some packages use nouns; for example the hansard package has commons_divisions() and mp_vote_record() and the HIBPwned package has account_breaches(). I like the specificity of this approach. Others use verbs, e.g. geocode() in the banR package.

My favourite approach at the moment is, as with the censusapi package, to take a specific noun and prefix it with "get", as with the GetSports() function in the pinnacle package or the get_lang() function in the languagelayeR package, or the get_owf() function in the openwindfarm package, or even the get_video_details() function in the tuber package.

Of course, there are some packages that use prefixes, e.g.

There are also some that use suffixes, e.g.

There is also the approach taken by the rosettaApi package, which is to have a single, top-level api() function.

Proposed API Changes

All this taken in stride, my proposal is to make the following changes:

All internal functions should just ditch their cancensus. prefix, since they're not exported anyway.

We can also keep aliases around for all these changes for a release or two so that no-one's code breaks.

I think that the current function parameters all have pretty good names, so I don't have any suggested changes there.

Alternatives

1) We can keep some element of namespacing, e.g. by using cm as a prefix, which is the same prefix currently used by the API key when it is an environment variable. So for example cm_get and cm_list_datasets. 2) We could use a top-level function with the same name as the package, that is, cancensus::cancensus() as the main entry point (replacing existing load*() functions). I don't think this is a particularly intuitive approach for this package.

mountainMath commented 7 years ago

Yeah, the original cancensus. prefix was my attempt of namespacing and encapsulating the functionality without understanding how things are usually done in R. I still like the general prefix idea since it makes it more transparent which package is used for function calls. I haven't done much with R but the overwriting of functionality by packages and subsequent importance of the order in which they are loaded freaks me out a bit.

I am quite amenable to name changes that stick closer to established usage in R. The cm_ prefix would work in that cm references CensusMapper which hosts the API. But really, whatever you guys think is good with me.

dshkol commented 7 years ago

Jens and I talked about this a bit at the very start and I understood how R name conventions and tendency to mask functions could be bewildering to someone experienced in other languages - so I appreciate the class and method approach, but it is out of convention with other R packages. I like the proposed names, and I would even go so far as:

cancensus::cancensus.list_vectors becomes cancensus::list_vectors cancensus::cancensus.search_vectors becomes cancensus::search_vectors cancensus::cancensus.list_regions becomes cancensus::list_regions

A person will use cancensus primarily to explore and pull data for subsequent analysis. While they are pulling that data, they are really only working in cancensus space. The above names don't mask any core or base functions, or anything in major packages that I can see. If they want to do something with a specific package that has a similar function, they will typically do it later in their workflow, so they will mask over cancensus, rather than the other way around.

atheriel commented 7 years ago

@mountainMath Yeah, the whole namespace thing is a mess. It's most similar to C, really, or maybe Common Lisp. Which I guess makes sense, given that those are the two biggest influences on the language.

dshkol commented 7 years ago

@mountainMath @atheriel if anyone wants to read the paper introducing the language it's here: https://www.stat.auckland.ac.nz/~ihaka/downloads/R-paper.pdf

Some interesting things from there are the tradeoffs that were made from the onset in the language, especially with how objects are stored and with scoping.

atheriel commented 7 years ago

@dshkol Well... that certainly took me down the rabbit hole. Thanks for the link. If you ever want to understand namespaces in R, I recommend reading the source code of library(), too.

Getting back to the topic at hand, I'm worried that list_vectors() is a little to ambiguous on its own. "vector" has a pretty specific meaning in R already. This is less the case with list_regions(), I guess.

dshkol commented 7 years ago

Then I'm ok to go with what you proposed for those three, so it's at least convenient. @mountainMath, thoughts?

mountainMath commented 7 years ago

I am getting a little worried that it's getting a little cluttered as we are adding convenience functions. I guess something in me wants some kind of encapsulation that makes it clear where these function calls belong. But I guess that's not how R works. So I will follow your lead.

dshkol commented 7 years ago

Its useful to remember that many R users came to it without a background in other languages, so they may not be familiar with best practices from elsewhere. So it may be not be the best approach, but it's what people are used to.

mountainMath commented 7 years ago

Yeah, and we have the cancensus:: as a way to encapsulate things, and autocomplete does a decent job at listing available function calls. So let's do it.

dshkol commented 7 years ago

Is there any need for backward compatibility? I don't think anyone is really using cansensus yet.

mountainMath commented 7 years ago

Looking at API usage, some have dabbled in it. But not that many. And I don't think we should let ourselves get tied down too much in backward compatibility issues this early on. If it breaks it breaks, it's still in beta as far as I am concerned.

atheriel commented 7 years ago

We can just have

cancensus.load <- get_census

which creates an undocumented alias, so existing code will still work. Or we can warn of depreciation inside of an exported but undocumented function, e.g.

#' @export
cancensus.load <- function(...) {
  warning("cancensus.load is depreciated, use get_census() instead.", call. = FALSE)
  get_census(...)
}
dshkol commented 7 years ago

Worried about bloating this with backwards compatibility where it's not really necessary.

mountainMath commented 7 years ago

Let's not worry about backward compatibility and just make the changes.

atheriel commented 7 years ago

Ok, I've started implementing this.

mountainMath commented 7 years ago

Done in https://github.com/mountainMath/cancensus/pull/60.