Closed atheriel closed 7 years ago
Yeah, the original cancensus.
prefix was my attempt of namespacing and encapsulating the functionality without understanding how things are usually done in R. I still like the general prefix idea since it makes it more transparent which package is used for function calls. I haven't done much with R but the overwriting of functionality by packages and subsequent importance of the order in which they are loaded freaks me out a bit.
I am quite amenable to name changes that stick closer to established usage in R. The cm_
prefix would work in that cm
references CensusMapper which hosts the API. But really, whatever you guys think is good with me.
Jens and I talked about this a bit at the very start and I understood how R name conventions and tendency to mask functions could be bewildering to someone experienced in other languages - so I appreciate the class and method approach, but it is out of convention with other R packages. I like the proposed names, and I would even go so far as:
cancensus::cancensus.list_vectors
becomes cancensus::list_vectors
cancensus::cancensus.search_vectors
becomes cancensus::search_vectors
cancensus::cancensus.list_regions
becomes cancensus::list_regions
A person will use cancensus primarily to explore and pull data for subsequent analysis. While they are pulling that data, they are really only working in cancensus space. The above names don't mask any core or base functions, or anything in major packages that I can see. If they want to do something with a specific package that has a similar function, they will typically do it later in their workflow, so they will mask over cancensus, rather than the other way around.
@mountainMath Yeah, the whole namespace thing is a mess. It's most similar to C, really, or maybe Common Lisp. Which I guess makes sense, given that those are the two biggest influences on the language.
@mountainMath @atheriel if anyone wants to read the paper introducing the language it's here: https://www.stat.auckland.ac.nz/~ihaka/downloads/R-paper.pdf
Some interesting things from there are the tradeoffs that were made from the onset in the language, especially with how objects are stored and with scoping.
@dshkol Well... that certainly took me down the rabbit hole. Thanks for the link. If you ever want to understand namespaces in R, I recommend reading the source code of library()
, too.
Getting back to the topic at hand, I'm worried that list_vectors()
is a little to ambiguous on its own. "vector" has a pretty specific meaning in R already. This is less the case with list_regions()
, I guess.
Then I'm ok to go with what you proposed for those three, so it's at least convenient. @mountainMath, thoughts?
I am getting a little worried that it's getting a little cluttered as we are adding convenience functions. I guess something in me wants some kind of encapsulation that makes it clear where these function calls belong. But I guess that's not how R works. So I will follow your lead.
Its useful to remember that many R users came to it without a background in other languages, so they may not be familiar with best practices from elsewhere. So it may be not be the best approach, but it's what people are used to.
Yeah, and we have the cancensus::
as a way to encapsulate things, and autocomplete does a decent job at listing available function calls. So let's do it.
Is there any need for backward compatibility? I don't think anyone is really using cansensus yet.
Looking at API usage, some have dabbled in it. But not that many. And I don't think we should let ourselves get tied down too much in backward compatibility issues this early on. If it breaks it breaks, it's still in beta as far as I am concerned.
We can just have
cancensus.load <- get_census
which creates an undocumented alias, so existing code will still work. Or we can warn of depreciation inside of an exported but undocumented function, e.g.
#' @export
cancensus.load <- function(...) {
warning("cancensus.load is depreciated, use get_census() instead.", call. = FALSE)
get_census(...)
}
Worried about bloating this with backwards compatibility where it's not really necessary.
Let's not worry about backward compatibility and just make the changes.
Ok, I've started implementing this.
The package currently uses a very unusual naming convention for functions, namely
package.function_name
. I haven't seen this style in R before, but R is also famous for its inconsistent naming conventions. I didn't give it much thought for a while, but as the package draws nearer to release, I thought it might be a good idea to think more carefully about naming. I think we should be motivated to have clear, unambiguous names, of course, but beyond that things are more a matter of style.What Do Other Packages Do?
Most other R packages in my experience do not use a prefix or suffix at all (e.g.
dplyr
). However, I did take a look at the most recently updated 50 or so API-backed R packages, and there the naming conventions are more varied.Probably the closest package to this one is the censusapi package, which uses a
getCensus()
function for querying and alistCensusMetadata()
function for inspection. The junr package, which is also somewhat similar, usesget_index()
,get_data()
andlist_titles()
, which I think are too generic -- I would prefer to stick "census" in there somewhere. (I think the dwapi package also suffers from this problem).Some packages use nouns; for example the hansard package has
commons_divisions()
andmp_vote_record()
and the HIBPwned package hasaccount_breaches()
. I like the specificity of this approach. Others use verbs, e.g.geocode()
in the banR package.My favourite approach at the moment is, as with the censusapi package, to take a specific noun and prefix it with "get", as with the
GetSports()
function in the pinnacle package or theget_lang()
function in the languagelayeR package, or theget_owf()
function in the openwindfarm package, or even theget_video_details()
function in the tuber package.Of course, there are some packages that use prefixes, e.g.
bold_
k9_
ref_
bea
ct_
core_
nneo_
There are also some that use suffixes, e.g.
_sidrar
suffix for its functions, adoptingget_
,search_
, etc function names -- so this could also be considered in the first categorysearch_pv
function (that is, it uses a suffix)querySW
functionThere is also the approach taken by the rosettaApi package, which is to have a single, top-level
api()
function.Proposed API Changes
All this taken in stride, my proposal is to make the following changes:
cancensus::cancensus.load
becomescancensus::get_census
cancensus::cancensus.load_data
becomescancensus::get_census_data
cancensus::cancensus.load_geo
becomescancensus::get_census_geometry
cancensus::cancensus.list_datasets
becomeslist_census_datasets
, although I believe @mountainMath wants to add some non-census datasets in the future.cancensus::cancensus.list_vectors
becomescancensus::list_census_vectors
cancensus::cancensus.search_vectors
becomescancensus::search_census_vectors
cancensus::cancensus.list_regions
becomescancensus::list_census_regions
cancensus::cancensus.census_labels
becomecancensus::census_vectors
, since that is what they are termed in the parameters and in thelist_*
function.All internal functions should just ditch their
cancensus.
prefix, since they're not exported anyway.We can also keep aliases around for all these changes for a release or two so that no-one's code breaks.
I think that the current function parameters all have pretty good names, so I don't have any suggested changes there.
Alternatives
1) We can keep some element of namespacing, e.g. by using
cm
as a prefix, which is the same prefix currently used by the API key when it is an environment variable. So for examplecm_get
andcm_list_datasets
. 2) We could use a top-level function with the same name as the package, that is,cancensus::cancensus()
as the main entry point (replacing existingload*()
functions). I don't think this is a particularly intuitive approach for this package.