Closed adamhsparks closed 7 years ago
I disagree. JSONurl_latlon_by_station_name
data set is useful in its own right. We spent a fair bit of time getting this data right, we may as well make it available. Maybe it would have been better to call it something different, but I'd prefer it stays as-is than be made internal.
I'm not convinced either. Discussion is welcome.
I'm not sure how we would be able to update the data if it were internal as well. R would store all our location data in a single file, we would lose control of separate data files to be updated.
-- Adam H Sparks
On 25 June 2017 at 13:39:30, HughParsonage (notifications@github.com) wrote:
I disagree. JSONurl_latlon_by_station_name data set is useful in its own right. We spent a fair bit of time getting this data right, we may as well make it available. Maybe it would have been better to call it something different, but I'd prefer it stays than is internal.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ToowoombaTrio/bomrang/issues/23#issuecomment-310880203, or mute the thread https://github.com/notifications/unsubscribe-auth/ADDEAv9IEyJEhg9zkbrDEdFJfCrWMQWzks5sHdZygaJpZM4OEdfB .
My rule-of-thumb is that data should not be exposed to the user if it has no meaning outside the package's functions, or if it would be too 'rude' to expose them because it conflict with an existing data set. Otherwise it should be exposed. (A lot of that would be my ill-discipline in documenting data sets though -- need a CRAN CMD check to keep me in line!).
But it's definitely a matter of style -- i.e. there's no right answer. And although I have a view, it's pretty weak. Having said that, I just realized a report I'm writing uses this data set directly to plot a map of weather stations, so a use-case is already established!
Just my 2c.
Allright, I think we're in agreement here that these are useful. I can imagine using them for mapping purposes as well. We'll leave as is.
Just one further thought on this admittedly closed issue: Making them internal in no way removes the ability of anyone to access them; mostly it just means they won't be lazy-loaded and won't automatically appear in package documentation. You can put whatever files you want in inst/extdata
(or inst/whatever
) and the package and all others would still have access via
system.file("extdata", "data.doc", package="bomrang")
and then load it however you want (with ".doc" illustrating that anything is possible here). The only principle remains exposing as little as possible for the fabled "envisioned average user" to still retain "full" functionality.
There's a lot of discussion on how to really hide data from users of R
packages, which is a much trickier issue, yet not relevant here. Data in inst/
is in no way hidden; just not exported to front-end.
@mpadge, as I recall, when I started looking at ways to update the package data on-the-fly by the end-user, I couldn't suss out how you would put it in inst
and still maintain the ability to run a function and update the data in the package after it's installed. So I went this route so that we aren't stuck constantly updating these metadata on the package when no other changes are necessary.
If you can illustrate how to do that for the various metadata we have, I'd reconsider, but when I've looked at my packages that I've written and put data in inst
it does not have a specific name and all the data are in one file, I thought?
The above call just returns the absolute file path, and you're then free to do whatever you want with it, so
fname <- system.file("extdata", "data.doc", package="bomrang")
fname
# [1] "<R dir stuff>/bomrang/inst/extdata/data.doc"
(Or more likely without "inst/" on a CRAN install.) Then, using some made-up functions,
doc <- read_word(fname) %>% make_changes()
write_doc_fmt(doc, file = fname)
Result: User's local version of extdata
file updated by internal package functions. It's only R/sysdata.rda
that gets lumped in a single file and remains truly hidden; otherwise pretty much anything goes.
I now I recall my issues with this. It's because Rmd files don't play nicely with file paths. Trying to save the data using save
to /inst/extdata
is more of a challenge than using devtools::use_data
in the Rmd file showing how the data were created. You either end up with internal or external data but the Rmd file handles it gracefully. I'll have to see if there's a way around.
Derp. That wasn't hard at all.
Example from my GSODR package.
save(country_list, file = "../inst/extdata/country_list", compress = "bzip2")
Sometimes I make life a bit harder than it needs to be.
just one tweak: instead of ../inst/extdata
you should use the system.file()
stuff above, because doing install.packages()
from within R
will drop ./inst/extdata
down one directory to ./extdata
, and ./inst
will no longer exist. The system.file()
stuff is failsafe regardless of where extdata
is actually located.
@mpadge, sorry, I don't follow. I want to save the inital raw data in /inst/extdata
how does the system.file
call (which only returns an empty string) help anything?
In fact, I can't get it to work when knitting.
Error in save(system.file("extdata", "data.doc", package = "bomrang")) :
object ‘system.file("extdata", "data.doc", package = "bomrang")’ not found
whereas
save(country_list, file = "../inst/extdata/country_list", compress = "bzip2")
Saves in the inst/extdata
folder of my packages project after knitting to then be built?
Example of what I'm referring to here: https://github.com/ropensci/GSODR/blob/devel/data-raw/fetch_country-list.md
oh yeah, that is one glitch with this approach: system.file()
''Finds the full names of files in packages etc.'', so if files don't actually exist it just returns an empty string. Passed to save()
, this then returns not found
. Luckily, what you guys need to do is only replace an existing file, so system.file()
will always work (when applied to inst/extdata/JSONUrl_latlon_by_station_name
).
To be convinced, make a tarball of bomrang
, then do a local install.packages()
with a file in ./inst/extdata
, and then examine the structure of the installed package - inst/
will be gone, and so your version will fail on an actual CRAN install, while getting the relative location with system.file()
will always work regardless of install method.
I think we're talking about two different things here.
Right now I'm talking about the creation of the initial file, not the modification, yet. Modifying the file wasn't the issue I was having. I was having an issue creating the initial file using the Rmd files in data-raw
.
AAC_codes
are no longer externally exposed when using bomerang
edit this, along with all other changes excepting Hugh's, are in the devel
branch. Hugh's are in his pull request, not yet merged.
Consider moving
AAC_codes
andJSONurl_latlon_by_station_name
toinst/data