trashbirdecology / bbsAssistant

An R package for downloading and handling data and information from the North American Breeding Bird Survey.
Creative Commons Zero v1.0 Universal
27 stars 13 forks source link

Focus top-level API / vignette on key user behavior #67

Closed ethanwhite closed 4 years ago

ethanwhite commented 4 years ago

This is recommendation based on my experience while reviewing this package for JOSS (https://github.com/openjournals/joss-reviews/issues/1768).

Is your feature request related to a problem? Please describe.

As a new user of the software I found the API as introduced in the vignette to be a bit complicated. The order of introduction is currently: 1) Create a data directory for storing data; 2) Download and load data into R (this data doesn't end up in the data directory); 3) Export the data into a local file (in a not widely used format; I love feather but it won't be familiar to many users); 4) Import the data (which requires doing some filename matching and manipulation). This felt like a lot of stuff if my goal is to quickly and easily work the the BBS data.

Describe the solution you'd like

My core use case for a package like this would be "Get BBS data into R and start working with it. Ideally without repeatedly downloading the data if I already have it". With that use case in mind I would consider making get_bbsData handle all of this for the user, with some optional arguments to control behavior. So, the default behavior for get_bbsData would be to:

  1. Check if the requested data exists locally
  2. If it doesn't then download the data and store it locally in a permanent (not temporary) location.
  3. Load the data into R.

I'd have an argument to allow setting the data directory (where files are checked for and downloaded to) and set a default for this directory to either the working directory (in a clearly named subdirectory) or a .bbsAssistant directory in the users home directory.

Describe alternatives you've considered

Given that most of the intended usage appears to focus on single BBS files, which are relatively small, another option would be to de-emphasize the feather storing and loading functionality in the introductory vignette.

trashbirdecology commented 4 years ago

Thanks for the comments. I agree with the updates to retrieving, storing and importing the raw data.

@ethanwhite I am curious your thoughts on feather. The only reason I chose this was to reduce filesize and import times for users who are working with a large portion of the data. I suppose I could just create file extension options while defaulting to a TSV, TXT, etc.

ethanwhite commented 4 years ago

I like feather in concept, but since adoption in the ecology world is somewhat limited I think it's an open question as to whether it's a good choice here. Given the size of the data I think the biggest benefit is using the package to educate ecologists about the format. Since BBS is <1GB in total I thing csv/tsv are both reasonable options that allow the data generated to be used cross-language. If you want the storage space read-time benefits of binary then since the usage is really in R I might just use the R data format. That said, as long as you're abstracting this from the user then I don't think it really matters. They just care about get_bbsData given them data in R, what the back end storage is won't matter to them - which means that any of these choices is perfectly reasonable.

trashbirdecology commented 4 years ago

True that

trashbirdecology commented 4 years ago

Ethan, I have addressed this through a major functionality revamp, currently in master branch.

The functionality is much more streamlined, and arguably less confusing.

Feather is out. CSV/txt is in.

All functions requiring an internet connection first check to see if the file(s) exist(s) locally, and allow user to specify whether they would like to overwrite with new downloads, or simply import the existing files (using menu). This appears in the "get_" functions.

Thanks again!