read data directly from zip files

trashbirdecology / bbsAssistant

An R package for downloading and handling data and information from the North American Breeding Bird Survey.

Creative Commons Zero v1.0 Universal

27 stars 13 forks source link

read data directly from zip files #106

Closed richardjtelford closed 2 years ago

richardjtelford commented 3 years ago

The unpacked data from BBS are huge. It would be convenient if bbsAssistant::import_bbs_data read the data directly from the zipped files rather than needing to have them unzipped first.

readr::read_csv should be able to do this directly when there is a single file in the zipped file. When there is more than one file, see https://community.rstudio.com/t/readr-read-all-csv-files-in-zip-archive/11471/7

I can experiment and send a pull request if you would like me to.

trashbirdecology commented 2 years ago

@richardjtelford my apologies I only just saw this issue :D

I am working on a minor update right now (see this branch). The biggest speed bottleneck was from the StateFiles, but the BBS USGS office recently changed where and how they archive their data releases. The 50-stop data is now spread across 10 or 11 subdirectories in each data release, and is a bit faster to unzip. I will toy around with speeds on using unz versus readr::read_csv() soonish.

Again, sorry for that shitty delay

[x] test speed of unpack_bbs_data() versus readr::csv().

trashbirdecology commented 2 years ago

@richardjtelford , thank you again for pointing this out. I've updated the major update branch which, hopefully, will be pulled into the main branch this weekend.