Open d-callan opened 8 months ago
ive added most, but not quite all. there are two remaining, preterm infant resistome 2 and Uganda maternal. ive run into some technical difficulties though.. weve reached a github file size limit (which ive never done before, so in a way i feel quite accomplished now :tada: ). basically i did a thing i didnt love, by adding all the datasets to a single data file. but i did it largely bc it was the only way to import and reexport data from one r package in another that i could find and i was trying to keep things simple for users by only knowing about a single package and namespace.
i could maybe get around the github file size limit, github recommends essentially hosting large files elsewhere and including in the repo a file pointing to that location. if done right, it says you can have the large file pulled down automatically when anyone clones the repo. ok, maybe. except ive realized that if we ever want a cran release, cran will not like our massive package most probably, and also users who arent interested in the curated datasets will probably be frustrated by it as well. so..
new proposal is: have the MicrobiomeDB package focus on features, computes, getting users data in and out of our custom objects, etc. and then release a second user-facing package w just the data, branded as an extension to the MicrobiomeDB package. i suspect most users will like that better, particularly ones w poor internet. but also, if we ever want a cran release, MicrobiomeDB and its dependencies have a fairly clear route, and we only have to figure out what to do about the data package (which we could keep hosting on github if we wanted, w one file per dataset, and forget about reexporting via MicrobiomeDB).
recording some things from dan for posterity: If you're running into issues with github file sizes, you could consider something like was implemented here using ExperimentHub
we have 6 here so far..