Closed mstrimas closed 6 years ago
Hi @mstrimas, Thank you for the submission. Sorry for the delay, but I am doing some initial editorial checks before locating suitable reviewers.
Here are some notes from a goodpractice::gp()
check (Please do run it yourself as you fix these issues or explain why you are unable to fix them).
It is good practice to
✖ write unit tests for all functions, and all package code
in general. 80% of code lines are covered by test cases.
R/auk-clean.r:50:NA
R/auk-clean.r:51:NA
R/auk-clean.r:52:NA
R/auk-clean.r:53:NA
R/auk-clean.r:54:NA
... and 161 more lines
✖ omit "Date" in DESCRIPTION. It is not required and it
gets invalid quite often. A build date will be added to the package
when you perform `R CMD build` on it.
✖ use '<-' for assignment instead of '='. '<-' is the
standard, and R users and developers are used it and it is easier
to read your code for them if you use '<-'.
R/utils.r:20:13
R/utils.r:74:39
R/utils.r:75:36
R/utils.r:77:15
R/utils.r:83:39
✖ fix this R CMD check WARNING: LaTeX errors when creating
PDF version. This typically indicates Rd problems.
✖ fix this R CMD check ERROR: Re-running with no
redirection of stdout/stderr. Hmm ... looks like a package You may
want to clean up by 'rm -rf /tmp/Rtmp6kBa0r/Rd2pdf2a896cf4bd4a'
────────────────────────────────────────────────────────────────────────────────
Warning messages:
1: In readLines(filename) :
incomplete final line found on '/root/foo/auk/R/auk-rollup.r'
2: In readLines(filename) :
incomplete final line found on '/root/foo/auk/R/read.r'
3: In readLines(filename) :
incomplete final line found on '/root/foo/auk/tests/testthat/test_auk-rollup.r'
4: In readLines(filename) :
incomplete final line found on '/root/foo/auk/tests/testthat/test_ebird-species.r'
Thanks, @karthik, I just fixed several of these, but the following remain:
✖ write unit tests for all functions, and all package code in general. 80% of code lines are covered
by test cases.
R/auk-clean.r:50:NA
R/auk-clean.r:51:NA
R/auk-clean.r:52:NA
R/auk-clean.r:53:NA
R/auk-clean.r:54:NA
... and 161 more lines
✖ fix this R CMD check WARNING: LaTeX errors when creating PDF version. This typically indicates Rd
problems.
✖ fix this R CMD check ERROR: Re-running with no redirection of stdout/stderr. Hmm ... looks like a
package You may want to clean up by 'rm -rf
/var/folders/mg/qh40qmqd7376xn8qxd6hm5lwjyy0h2/T//RtmpsPAVpf/Rd2pdf5dcb4abe733d'
R CMD check
warning and error: these don't arise when I use devtools::check()
or when I submit to CRAN, and I haven't had any luck figuring out what the issue is or how to resolve it@mstrimas No worries. Thank you for fixing the warnings. Regarding those warnings, have you tried adding a blank line at the end to those files. That should make the warnings go away.
Reviewer 1 is @aurielfournier Review due: August 23 (Auriel noted that she might need an additional week due to travel)
Reviewer 2 is @emhart Review due: August 27
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 6 (this is my first package review so I spent more time then I suspect I might on future reviews)
auk
does a great job of removing much of the pain and frustration of working with raw eBird data, which has been a limiting factor for many who want to take advantage of the vast data resources available through eBird. While there are other eBird packages, this is the only one I am aware of that allows you to work with the raw date downloaded from Cornell, as opposed to working with the summary data that can be gleaned from the eBird website.
The package is solid, though since I am not well versed in AWK by any means I'm unable to comment on those fine details.
The vignette is quite extensive, which is fantastic!
I'm reading this vignette thinking about 'the average ebird data user' who isn't necessarily someone with a extensive R background and so while this is a very detailed vignette, and I think the detail is good and important, it might be better if it was rearranged so that the heavy technical detail was towards the end, and the 'how to use this package' is more up front. Since heavy users will keep reading, but less experienced users may get overwhelmed by details that are not essential to them using the package.
My biggest suggestion would be to remove unlink()
from all the function help examples and the vignette. If the user just runs the whole chunk at the same time, like I did the first several times, then the output file isn't there, since R just created it and then deleted it. I understand why you have it there to avoid having lots of files in your own directory, but I think keeping unlink()
there it will create more issues then it solves, especially for less experienced R users.
I would encourage you to avoid abbreviations since most people aren't going to read the vignette word for word, and consider not using EBD
, and just saying 'basic dataset' or something along those lines instead. It will be much more readable/skim-able this way.
Throughout the vignette and function documentation you use pipes, which is great, I like pipes, but lots of people don't. In some cases because they don't like them and in others because they find them confusing. I think it would be valuable to also include examples of how the functions would be used without pipes in the vignette and in the function specific help files.
Since it is not good practice to write over an object with the same name ebd
, I would suggest editing your example to not do this, as it could cause issues for people running the examples piece meal and not following every step.
The function help examples in the different filter functions don't include auk_filter
at the end of the pipeline. I think it would make sense to include auk_filter
in all the examples since you mention in the function description that you need to include auk_filter
to finish the process. That way the example is demonstrating the function within its full context.
I don't check build/installation on things very often. So this is not going to be the high point of my review. devtools::check()
returned the following. If I am understanding this correctly there aren't any major issues on my machine.
Updating auk documentation
Loading auk
Setting env vars -----------------------------------------
CFLAGS : -Wall -pedantic
CXXFLAGS: -Wall -pedantic
Building auk ---------------------------------------------
"C:/PROGRA~1/R/R-34~1.1/bin/x64/R" --no-site-file \
--no-environ --no-save --no-restore --quiet CMD build \
"C:\Users\amf698\Documents\R\win-library\3.4\auk" \
--no-resave-data --no-manual
* checking for file 'C:\Users\amf698\Documents\R\win-library\3.4\auk/DESCRIPTION' ... OK
* preparing 'auk':
* checking DESCRIPTION meta-information ... OK
* checking whether 'INDEX' is up-to-date ... NO
* use '--force' to remove the existing 'INDEX'
* excluding invalid files
Subdirectory 'R' contains invalid file names:
'auk' 'auk.rdb' 'auk.rdx'
* checking for LF line-endings in source and make files
* checking for empty or unneeded directories
Removed empty directory 'auk/R'
Removed empty directory 'auk/man'
WARNING: Removing directory 'auk/Meta' which should only occur in an
installed package
WARNING: Removing directory 'auk/help' which should only occur in an
installed package
WARNING: Removing directory 'auk/html' which should only occur in an
installed package
* looking to see if a 'data/datalist' file should be added
* building 'auk_0.0.2.tar.gz'
Setting env vars -----------------------------------------
_R_CHECK_CRAN_INCOMING_ : FALSE
_R_CHECK_FORCE_SUGGESTS_: FALSE
Checking auk ---------------------------------------------
"C:/PROGRA~1/R/R-34~1.1/bin/x64/R" --no-site-file \
--no-environ --no-save --no-restore --quiet CMD check \
"C:\Users\amf698\AppData\Local\Temp\RtmpOCSZT8/auk_0.0.2.tar.gz" \
--as-cran --timings --no-manual
* using log directory 'C:/Users/amf698/AppData/Local/Temp/RtmpOCSZT8/auk.Rcheck'
* using R version 3.4.1 (2017-06-30)
* using platform: x86_64-w64-mingw32 (64-bit)
* using session charset: ISO8859-1
* using options '--no-manual --as-cran'
* checking for file 'auk/DESCRIPTION' ... OK
* this is package 'auk' version '0.0.2'
* package encoding: UTF-8
* checking package namespace information ... OK
* checking package dependencies ... NOTE
Package suggested but not available for checking: 'covr'
* checking if this is a source package ... ERROR
Only *source* packages can be checked.
* DONE
Status: 1 ERROR, 1 NOTE
See
'C:/Users/amf698/AppData/Local/Temp/RtmpOCSZT8/auk.Rcheck/00check.log'
for details.
R CMD check results
1 error | 0 warnings | 1 note
checking if this is a source package ... ERROR
Only *source* packages can be checked.
checking package dependencies ... NOTE
Package suggested but not available for checking: 'covr'
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
[X] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[X] Installation instructions: for the development version of package and any non-standard dependencies in README
[X] Vignette(s) demonstrating major functionality that runs successfully locally
[X] Function Documentation: for all exported functions in R help
[X] Examples for all exported functions in R Help that run successfully locally
[ ] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
The authors present a very elegant solution to a difficult problem in R, how to handle a very large data set such as the eBird data (larger than most people's personal computers could load into RAM) when most users only need a subset of the data in the full file. Their solution is to provide a way to automatically build an Awk script, execute it, and write a new output file. While this could be done without this R package, they make the data accessible to a much larger audience.
Overall I found the code to very comprehensive and elegant and think it will be a good addition to the rOpenSci package suite. The authors largely adhere to the rOpenSci package guidlines and are exceedingly diligent in their error handling in each function. Also I was impressed with their coverage use cases in their tests. They went above in beyond in writing an exhaustive suite of tests for each function. I find no major issues with how their code is written.
I do think there are a couple minor areas for improvement in making it easier for end users. The biggest issue I had was initially grokking that this was a multi-step workflow that involved writing a file to disk. My first impression was that I could simlply run a bunch of filters and the ebd
variable (in the README) would actually be a dataframe. If there was a way to make the work flow more explicit, especially in the README, I think that would be helpful. Another thought I have is, would it be possible to obfuscate this multi-step process and have a function that loads up the ebd, runs the filters, writes the file and reads it all into a tibble? That way the end user could side-step ever running read_ebd()
. Anoher minor issue I had was that there's somewhat mixed handling of what I think of as "user standards laziness". For instance, you insist on ISO date standards, but countries can be mixed case, and don't require the ISO country code. I see how on the one hand you're making it easier for users, but I found myself a bit confused about where I could skip on my standards when it came to input. I was honestly surprised when I could ender "gray jay" but not "Robin" (but "American Robin" worked fine).
Minor comments
Throughout your vignette and README you specify units for everything, but not extent. While I assume that it's decimal degrees, I think it would be good to be explicit.
You note that for this to work with windows they need cygwin installed in a specific directiory. I could forsee this being difficult, is there any way to specify the path to make it easier for windows users?
Consider adding a CITATION file so your package can be cited.
Would it be too much to include a column filter option to your workflow? Maybe that's a 0.0.3 feature, but seems like it would be a nice addition
There were some instances where you used system variable names as local variable names in functions, e.g. line 36 in auk_time.r time <- paste0(ifelse(nchar(time) == 4, "0", ""), time)
. It obviously doesn't cause an issue as is, but maybe it could down the line.
Community guidelines
A CONTRIBUTING or way to cotribute in the README is not present. Consider adding contributor guidelines.
Examples
I ran all examples using devtools::run_examples
and all ran without error.
Tests
I ran all tests with devtools::test()
and all tests were passed.
Checks
I built the package on the following system using devtools::test(cran = TRUE)
:
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
All checks were passed with no notes, errors, or warninging.
Test coverage
I checked for the amount of test coverage using covr::package_coverage()
and it was 80.9%
Furthermore I reviewed all the tests in tests/testthat
, not only was there good test coverage, the range of scenarios was exhaustive. I was very impressed with the breadth of cases tested.
sessionInfo()
Just so you can see what versions of packages I used to run my tests:
Session info ----------------------------------------------------------------------------------------
setting value
version R version 3.4.1 (2017-06-30)
system x86_64, darwin15.6.0
ui RStudio (1.0.153)
language (EN)
collate en_US.UTF-8
tz America/Los_Angeles
date 2017-08-23
Packages --------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 CRAN (R 3.4.1)
auk * 0.0.2.901 <NA> local
backports 1.1.0 2017-05-22 CRAN (R 3.4.1)
base * 3.4.1 2017-07-07 local
bindr 0.1 2016-11-13 CRAN (R 3.4.1)
bindrcpp * 0.2 2017-06-17 CRAN (R 3.4.1)
callr 1.0.0 2016-06-18 CRAN (R 3.4.0)
clisymbols 1.2.0 2017-08-24 Github (gaborcsardi/clisymbols@e49b4f5)
commonmark 1.2 2017-03-01 CRAN (R 3.4.1)
compiler 3.4.1 2017-07-07 local
countrycode 0.19 2017-02-06 CRAN (R 3.4.0)
covr * 3.0.0 2017-06-26 CRAN (R 3.4.1)
crayon 1.3.2 2016-06-28 CRAN (R 3.4.1)
cyclocomp 1.1.0 2017-08-24 Github (MangoTheCat/cyclocomp@6156a12)
data.table 1.10.4 2017-02-01 CRAN (R 3.4.0)
datasets * 3.4.1 2017-07-07 local
desc 1.1.1 2017-08-03 CRAN (R 3.4.1)
devtools * 1.13.3 2017-08-02 CRAN (R 3.4.1)
digest 0.6.12 2017-01-27 CRAN (R 3.4.1)
dplyr 0.7.2 2017-07-20 CRAN (R 3.4.1)
evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
glue 1.1.1 2017-06-21 CRAN (R 3.4.1)
goodpractice 1.0.0 2017-08-24 Github (MangoTheCat/goodpractice@9969799)
graphics * 3.4.1 2017-07-07 local
grDevices * 3.4.1 2017-07-07 local
hms 0.3 2016-11-22 CRAN (R 3.4.0)
httr 1.3.1 2017-08-20 CRAN (R 3.4.1)
igraph 1.1.2 2017-07-21 CRAN (R 3.4.1)
jsonlite 1.5 2017-06-01 CRAN (R 3.4.1)
knitr 1.17 2017-08-10 CRAN (R 3.4.1)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.4.0)
lintr 1.0.1 2017-08-10 CRAN (R 3.4.1)
magrittr 1.5 2014-11-22 CRAN (R 3.4.1)
memoise 1.1.0 2017-04-21 CRAN (R 3.4.1)
methods * 3.4.1 2017-07-07 local
pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.1)
praise 1.0.0 2015-08-11 CRAN (R 3.4.0)
purrr 0.2.3 2017-08-02 CRAN (R 3.4.1)
R6 2.2.2 2017-06-17 CRAN (R 3.4.1)
rcmdcheck 1.2.1 2016-09-28 CRAN (R 3.4.0)
Rcpp 0.12.12 2017-07-15 CRAN (R 3.4.1)
readr 1.1.1 2017-05-16 CRAN (R 3.4.0)
remotes 1.1.0 2017-07-09 CRAN (R 3.4.1)
rex 1.1.1 2016-12-05 CRAN (R 3.4.0)
rlang 0.1.2 2017-08-09 CRAN (R 3.4.1)
roxygen2 6.0.1 2017-02-06 CRAN (R 3.4.1)
rprojroot 1.2 2017-01-16 CRAN (R 3.4.1)
rstudioapi 0.6 2016-06-27 CRAN (R 3.4.1)
stats * 3.4.1 2017-07-07 local
stringi 1.1.5 2017-04-07 CRAN (R 3.4.1)
stringr 1.2.0 2017-02-18 CRAN (R 3.4.1)
testthat * 1.0.2 2016-04-23 CRAN (R 3.4.0)
tibble 1.3.4 2017-08-22 CRAN (R 3.4.1)
tidyr 0.7.0 2017-08-16 CRAN (R 3.4.1)
tools 3.4.1 2017-07-07 local
utils * 3.4.1 2017-07-07 local
whoami 1.1.1 2015-07-13 CRAN (R 3.4.0)
withr 2.0.0 2017-07-28 CRAN (R 3.4.1)
xml2 1.1.1 2017-01-24 CRAN (R 3.4.1)
xmlparsedata 1.0.1 2016-06-18 CRAN (R 3.4.0)
@mstrimas As an aside from my review I wanted to say that this is a really cool solution to a big problem in R that I actually encounter in my work often. I might have a dataset that's a 20-30 GB and I don't want to actually crunch the whole thing in R. So I do a slightly more hacky approach which is to do some filtering in the data export phase (in SQL) and then some basic shell commands to sample / trim it down more, and then read it into R to do things like model POC. Do you think there's a way to make this package completely generic?
I'm imagining a scenario where I input the file location, column header names, a series of generic filters, and then the same basic workflow happens, awk executes the script and then writes an output file. Then this same workflow could work on any large text file. It seems like that would be a really powerful tool that would extend the functionality of this approach beyond eBird. Do you think that would be feasible?
Thanks for all the helpful feedback! I'll start working through your suggestions and incorporating them.
@emhart yes, I think there is potential to make a more general AWK package for working with large files. In fact, I did originally considering doing that first, then making auk
depend on the more general package, but just didn't have the time. there may also be better options than AWK that I'm not aware of... in any case, I think it would be useful to have a tool for processing text files that are too large to handle directly in R.
@aurielfournier A gentle ping 🙏
@aurielfournier Sorry I totally missed that your review was above Ted's. My apologies.
No problem @karthik ! Our reviews came in withing a few hours of each other, easy to miss. I appreciate the gentle reminder, those are often necessary to keep me on top of things.
Finally getting to this, here are responses to @aurielfournier comments:
unlink
: done!auk_filter()
in function help: this was by design. Since auk_filter()
is the only function in the package that has an external dependency (i.e. AWK) that isn't installed on some systems (e.g. Windows), I decided to use it minimally in the help examples. I can include it, but then I'll need to enclose all examples in dontrun{}
blocks. Maybe this isn't a big deal though. @emhart @aurielfournier, what are your thoughts, is having auk_filter()
important enough warrant using dontrun{}
?Thanks for the code review!!!
Here are my responses to @emhart:
as.Date()
. Do you have any thoughts on how to guess which format the text date is in? Species are tough. "robin" doesn't work because there are so many different species of robin and no way to know you mean "american robin". More tricky is different common or scientific names for different species. Currently, for simplicity, the user must give the English common name or scientific name (case insensitive) as used in the eBird taxonomy. Ideally, we'd like to allow users to look up species using other taxonomies, names in different languages, alternate spellings, etc. but this is a daunting task that we probably won't get to for some time.Thanks!!!
@emhart Just added ability to manually set awk path by setting the AWK_PATH
environment variable in .Renviron. Should work on Mac or Windows, though I don't have a Windows machine to test.
@mstrimas
The Quick Start is great, exactly what I was looking for.
I think one example is sufficient for the with and without pipes.
I see what you mean about auk_filter()
now, I guess I could go either way on that one. Do you have thoughts @emhart ?
@aurielfournier I've added pipe-free examples to all functions for pipe haters.
@karthik what's the next step here?
The eBird taxonomy and EBD was just updated and my intention is to submit a new version of auk to CRAN in the next few days reflecting the taxonomy changes and the suggestion from the reviewers.
Sorry for the delay @mstrimas here are a few quick thoughts:
@emhart thanks! I like the idea of column subsetting and will start looking at the best way to implement that.
auk_filter()
now has additional arguments keep
and drop
that users can use to specify which columns are output.
@karthik just released a new version to CRAN with most of the changes suggested by the review process included, as well as a variety of other new features and bug fixes. let me know what the next steps are to get this up on rOpenSci. Thanks!
@emhart @aurielfournier Could you two take a look at the recent updates and let me know if you are ready to sign off? 🙏
I will do my best to get to this in the next two weeks. This are a bit swamped on my end at the moment.
Thank you @aurielfournier! much appreciated. 🙏
@emhart @aurielfournier gentle ping on signing off on this submission (or raising further issues). 🙏
Oh crap. I'm so sorry. I'll do my best to get to this as soon as I can, but the federal shutdown is messing up my week majorly, and even if they do come back tomorrow its going to take a few days for things to even out again.
I am happy to sign off on this package. Great job @mstrimas
Thanks @aurielfournier! @karthik how would you suggest proceeding given that @emhart seems to be AWOL?
@mstrimas I've dropped Ted an email and will hopefully hear back from him soon. Sorry for the delays.
I took a look @mstrimas happy to sign off as well! Thanks for incorporating the feedback, the package looks great.
Congrats on your package being accepted @mstrimas! 🎉 🎈 And a huge thanks to @aurielfournier and @emhart for their expertise and time on this review! 🙏
Here are your next steps:
[![](https://badges.ropensci.org/136_status.svg)](https://github.com/ropensci/onboarding/issues/136)
Please also add a footer to the bottom of your README
[![](http://www.ropensci.org/public_images/github_footer.png)](http://ropensci.org)
ropensci
github org. This will allow you to transfer the repo. Once it's transferred, I'll give you write access there so you can update the CI badges.~Once moved, please re-run all checks in preparation for submission to CRAN. I can help with this if you run into any issues.
Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). If you are, @stefaniebutland will be in touch about content and timing.
Hi @karthik, In my original conversations with @noamross we agreed to host the package on the Cornell Lab of Ornithology's GitHub page and have a read only mirror at rOpenSci. I've added the footer and badge to the readme; however, our intention is to keep links and CI pointing to our organizations page. What's the best way to proceed with setting up a read only mirror on rOpenSci? Thanks!
@mstrimas Hi Matt, understood. You can skip the transfer step and I'll look into the best way for setting up a mirror and get back to you with further details.
Hello @mstrimas. Congratulations on auk
acceptance! We would love to host a post about it, so if you're interested, have a look at the editorial and technical info here https://github.com/ropensci/roweb2#contributing-a-blog-post and let me know if you are considering it.
@stefaniebutland sure, I'm happy to modify the vignette into a blog post. Probably won't be able to get to it for a week or two though.
No problem @mstrimas. I have Tues Feb 27 available for a post and we typically ask for a draft for review at least a week before the post date. What do you think about Tues Feb 20 to submit a draft via pull request?
That works for me, @stefaniebutland, thanks!
@mstrimas Just wanted to give you a quick update. We are still working out a good way to do the mirroring, but I'll let you know soon. Also I am about to travel for a bit, so I'll update the thread upon my return (late Feb). 🙏
Hi @stefaniebutland, looks like there's going to be a large update to the data underlying this package in mid March that will requiring some changes to the package that break backward compatibility. Would you be open to pushing the blog post back until the new version is released? If this messes things up for you, no worries, I can proceed with the post as is and just avoid the features that will get broken.
@mstrimas It's whatever you think is best. Blog post timing is flexible. You only really get the audience once for this kind of thing though, so if you prefer to publish after updates to avoid frustrating people once they're engaged, then we can postpone. Perhaps you can draft your post ideas for yourself now before they go stale ;-) and then fill in soon after you've made the required changes to the package.
I'll mark my calendar to check in with you in late March.
Ok, that's what I'm thinking, only one chance to catch people's eye so better to have the package in tip top shape. Late March sounds good. Thanks!
@mstrimas
looks like there's going to be a large update to the data underlying this package in mid March
Checking in to see if timing is right for to draft a blog post - no rush if pkg not updated yet
By the way, it'd be grand if the blog post explained a bit how to choose between using auk
and @sckott's rebird
depending on the use case. 😺 The information could also be in the READMEs of both packages. Thinking of this because this week I was at a loss which of the two to recommend. 😀
@stefaniebutland I think the package is ready, I can start putting something together this week. Thanks for the reminder!
@maelle rebird
is an interface to the eBird API, which gives access to a very limited subset of the data, e.g. the last 30 days of observations from a location. I think of rebird
as being useful for building tools and visualizations for birders; however, for most ecological applications (e.g. distribution modeling) you'll want access to the full eBird database (~500 million records).
Thanks a lot for the explanations @mstrimas! It'd be a nice footnote as well in my opinion (of the post and READMEs).
When you say 30 days of observation you mean for raw occurrence data right? For frequency derived from it it seems you can get older data e.g. https://github.com/stephhazlitt/ruhu-ebird-observations/blob/master/R/ruhu-ebird-observations.md
@maelle I wasn't aware of the ebirdfreq()
function, that's cool! Seems all the other functions are "recent" observations, but that one does give access to historical data at state, county, and hotspot level. It's also worth noting the rebird
is easier to use and much faster, so if your data needs can be met by rebird
, I'd say it's definitely preferred.
I'll add something to the README explaining the difference, thanks for the suggestion!
Awesome! It'll be super useful to guide users finding any of the 2 packages first! I wonder if the info should also live in the vignette because of people installing from CRAN and therefore not having the README 🤔
@maelle updated the README and vignette as per your suggestion
Fantastic! Speaking of other rOpenSci packages, I am also wondering whether/how one could use bowerbird
(not an ornithology package despite its name) and auk
to keep, update and use a local copy of eBird dataset, I might ping you if I ever try to write such an use case.
@stefaniebutland here's a first draft of a blog post.
What topicid and date should I use? Also, is there somewhere in the website repo I can put a couple data files (~ 3 MB). If there isn't a good spot, I'll just leave them in my GitHub repo.
If this looks good I can submit a pull request to the rOpenSci website repo.
Summary
Access to the eBird database, consisting of over 400 million observations, is provided via a huge (>150 GB) text file. The
auk
package extracts records from this file and imports them into R for analysis. Both presence only and presence/absence data can be generated.URL for the package (the development repository, not a stylized html page): https://github.com/CornellLabofOrnithology/auk
Please indicate which category or categories from our package fit policies this package falls under and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
This package falls somewhere at the intersection of data retrieval and extraction. It provides access to the eBird database; however, it does so by processing a text file downloaded from eBird that contains the full database.
Anyone looking to work with eBird data for science or conservation.
rebird
provides access to eBird data via the eBird API; however, this only gives access to last 30 days of data. This package is the only one giving access to full eBird database.Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
with a high-level description in the package root or ininst/
.Detail
[X] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[X] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names: