Closed ethanwhite closed 4 years ago
Thanks for the comments. I agree with the updates to retrieving, storing and importing the raw data.
@ethanwhite I am curious your thoughts on feather. The only reason I chose this was to reduce filesize and import times for users who are working with a large portion of the data. I suppose I could just create file extension options while defaulting to a TSV, TXT, etc.
I like feather in concept, but since adoption in the ecology world is somewhat limited I think it's an open question as to whether it's a good choice here. Given the size of the data I think the biggest benefit is using the package to educate ecologists about the format. Since BBS is <1GB in total I thing csv/tsv are both reasonable options that allow the data generated to be used cross-language. If you want the storage space read-time benefits of binary then since the usage is really in R I might just use the R data format. That said, as long as you're abstracting this from the user then I don't think it really matters. They just care about get_bbsData
given them data in R, what the back end storage is won't matter to them - which means that any of these choices is perfectly reasonable.
True that
Ethan, I have addressed this through a major functionality revamp, currently in master branch.
The functionality is much more streamlined, and arguably less confusing.
Feather is out. CSV/txt is in.
All functions requiring an internet connection first check to see if the file(s) exist(s) locally, and allow user to specify whether they would like to overwrite with new downloads, or simply import the existing files (using menu). This appears in the "get_" functions.
Thanks again!
This is recommendation based on my experience while reviewing this package for JOSS (https://github.com/openjournals/joss-reviews/issues/1768).
Is your feature request related to a problem? Please describe.
As a new user of the software I found the API as introduced in the vignette to be a bit complicated. The order of introduction is currently: 1) Create a data directory for storing data; 2) Download and load data into R (this data doesn't end up in the data directory); 3) Export the data into a local file (in a not widely used format; I love
feather
but it won't be familiar to many users); 4) Import the data (which requires doing some filename matching and manipulation). This felt like a lot of stuff if my goal is to quickly and easily work the the BBS data.Describe the solution you'd like
My core use case for a package like this would be "Get BBS data into R and start working with it. Ideally without repeatedly downloading the data if I already have it". With that use case in mind I would consider making
get_bbsData
handle all of this for the user, with some optional arguments to control behavior. So, the default behavior forget_bbsData
would be to:I'd have an argument to allow setting the data directory (where files are checked for and downloaded to) and set a default for this directory to either the working directory (in a clearly named subdirectory) or a
.bbsAssistant
directory in the users home directory.Describe alternatives you've considered
Given that most of the intended usage appears to focus on single BBS files, which are relatively small, another option would be to de-emphasize the
feather
storing and loading functionality in the introductory vignette.