ropensci-archive / bomrang

:warning: ARCHIVED :warning: Australian government Bureau of Meteorology (BOM) data client for R
Other
109 stars 26 forks source link

`AAC_codes` and `JSONurl_latlon_by_station_name` don't need to be exposed to the user #23

Closed adamhsparks closed 7 years ago

adamhsparks commented 7 years ago

Consider moving AAC_codes and JSONurl_latlon_by_station_name to inst/data

HughParsonage commented 7 years ago

I disagree. JSONurl_latlon_by_station_name data set is useful in its own right. We spent a fair bit of time getting this data right, we may as well make it available. Maybe it would have been better to call it something different, but I'd prefer it stays as-is than be made internal.

adamhsparks commented 7 years ago

I'm not convinced either. Discussion is welcome.

I'm not sure how we would be able to update the data if it were internal as well. R would store all our location data in a single file, we would lose control of separate data files to be updated.

-- Adam H Sparks

On 25 June 2017 at 13:39:30, HughParsonage (notifications@github.com) wrote:

I disagree. JSONurl_latlon_by_station_name data set is useful in its own right. We spent a fair bit of time getting this data right, we may as well make it available. Maybe it would have been better to call it something different, but I'd prefer it stays than is internal.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ToowoombaTrio/bomrang/issues/23#issuecomment-310880203, or mute the thread https://github.com/notifications/unsubscribe-auth/ADDEAv9IEyJEhg9zkbrDEdFJfCrWMQWzks5sHdZygaJpZM4OEdfB .

HughParsonage commented 7 years ago

My rule-of-thumb is that data should not be exposed to the user if it has no meaning outside the package's functions, or if it would be too 'rude' to expose them because it conflict with an existing data set. Otherwise it should be exposed. (A lot of that would be my ill-discipline in documenting data sets though -- need a CRAN CMD check to keep me in line!).

But it's definitely a matter of style -- i.e. there's no right answer. And although I have a view, it's pretty weak. Having said that, I just realized a report I'm writing uses this data set directly to plot a map of weather stations, so a use-case is already established!

Just my 2c.

adamhsparks commented 7 years ago

Allright, I think we're in agreement here that these are useful. I can imagine using them for mapping purposes as well. We'll leave as is.

mpadge commented 7 years ago

Just one further thought on this admittedly closed issue: Making them internal in no way removes the ability of anyone to access them; mostly it just means they won't be lazy-loaded and won't automatically appear in package documentation. You can put whatever files you want in inst/extdata (or inst/whatever) and the package and all others would still have access via

system.file("extdata", "data.doc", package="bomrang")

and then load it however you want (with ".doc" illustrating that anything is possible here). The only principle remains exposing as little as possible for the fabled "envisioned average user" to still retain "full" functionality.

There's a lot of discussion on how to really hide data from users of R packages, which is a much trickier issue, yet not relevant here. Data in inst/ is in no way hidden; just not exported to front-end.

adamhsparks commented 7 years ago

@mpadge, as I recall, when I started looking at ways to update the package data on-the-fly by the end-user, I couldn't suss out how you would put it in inst and still maintain the ability to run a function and update the data in the package after it's installed. So I went this route so that we aren't stuck constantly updating these metadata on the package when no other changes are necessary.

If you can illustrate how to do that for the various metadata we have, I'd reconsider, but when I've looked at my packages that I've written and put data in inst it does not have a specific name and all the data are in one file, I thought?

mpadge commented 7 years ago

The above call just returns the absolute file path, and you're then free to do whatever you want with it, so

fname <- system.file("extdata", "data.doc", package="bomrang")
fname
# [1] "<R dir stuff>/bomrang/inst/extdata/data.doc"

(Or more likely without "inst/" on a CRAN install.) Then, using some made-up functions,

doc <- read_word(fname) %>% make_changes()
write_doc_fmt(doc, file = fname)

Result: User's local version of extdata file updated by internal package functions. It's only R/sysdata.rda that gets lumped in a single file and remains truly hidden; otherwise pretty much anything goes.

adamhsparks commented 7 years ago

I now I recall my issues with this. It's because Rmd files don't play nicely with file paths. Trying to save the data using save to /inst/extdata is more of a challenge than using devtools::use_data in the Rmd file showing how the data were created. You either end up with internal or external data but the Rmd file handles it gracefully. I'll have to see if there's a way around.

adamhsparks commented 7 years ago

Derp. That wasn't hard at all.

Example from my GSODR package.

save(country_list, file = "../inst/extdata/country_list", compress = "bzip2")

Sometimes I make life a bit harder than it needs to be.

mpadge commented 7 years ago

just one tweak: instead of ../inst/extdata you should use the system.file() stuff above, because doing install.packages() from within R will drop ./inst/extdata down one directory to ./extdata, and ./inst will no longer exist. The system.file() stuff is failsafe regardless of where extdata is actually located.

adamhsparks commented 7 years ago

@mpadge, sorry, I don't follow. I want to save the inital raw data in /inst/extdata how does the system.file call (which only returns an empty string) help anything?

In fact, I can't get it to work when knitting.

Error in save(system.file("extdata", "data.doc", package = "bomrang")) : 
  object ‘system.file("extdata", "data.doc", package = "bomrang")’ not found

whereas

save(country_list, file = "../inst/extdata/country_list", compress = "bzip2")

Saves in the inst/extdata folder of my packages project after knitting to then be built?

adamhsparks commented 7 years ago

Example of what I'm referring to here: https://github.com/ropensci/GSODR/blob/devel/data-raw/fetch_country-list.md

mpadge commented 7 years ago

oh yeah, that is one glitch with this approach: system.file() ''Finds the full names of files in packages etc.'', so if files don't actually exist it just returns an empty string. Passed to save(), this then returns not found. Luckily, what you guys need to do is only replace an existing file, so system.file() will always work (when applied to inst/extdata/JSONUrl_latlon_by_station_name).

To be convinced, make a tarball of bomrang, then do a local install.packages() with a file in ./inst/extdata, and then examine the structure of the installed package - inst/ will be gone, and so your version will fail on an actual CRAN install, while getting the relative location with system.file() will always work regardless of install method.

adamhsparks commented 7 years ago

I think we're talking about two different things here.

Right now I'm talking about the creation of the initial file, not the modification, yet. Modifying the file wasn't the issue I was having. I was having an issue creating the initial file using the Rmd files in data-raw.

adamhsparks commented 7 years ago

AAC_codes are no longer externally exposed when using bomerang

edit this, along with all other changes excepting Hugh's, are in the devel branch. Hugh's are in his pull request, not yet merged.

adamhsparks commented 7 years ago

Closed with https://github.com/ToowoombaTrio/bomrang/commit/d9f13cbe73f840f787b941e70e67e10f782a9a68