Closed sckott closed 5 years ago
any ideas here @DomBennett ?
Hi @sckott,
Yup, sequences!
The package acts as a portal to NCBI GenBank, which, as of August 2019, hosts some 213,865,349 sequences. But the package likely also makes use of the WGS information too as it will pull out any relevant annotated sequence. So maybe you're best data metric is number of bases: 366,733,917,629 + 5,585,922,333,160 = ~6e+12
For the most part, the phylotaR
package uses the rentrez
package. So whatever stats you pull up for that package, with respect to GenBank, applies to this one too.
thanks for this! David did give me some numbers for rentrez, but that bases estimate is a nice one he didn't have.
👋 as part of preparing an rOpenSci annual report, we're trying to estimate amount of data the various pkgs in our suite provide access to.
Do you have a sense for how much data (e.g., in GB) one can access through this pkg? And/or whatever metric is most relevant for this data (sequences maybe?)?