Hi @RayPlante, not sure if this is the best place to put this, but I was hoping the team might consider a small effort to implement/work with the pooch project to support the programmatic downloading of data from data.nist.gov. If you're not familiar, pooch has become a pretty widely-used tool in the scientific Python community for downloading datasets and other web resources, with tools for built in caching and some other nifty tricks.
The coolest part (to me) is the DOIDownloader class that allows you to say pooch.retrieve("doi:10.6084/m9.figshare.14763051.v1/tiny-data.txt"), and it will parse the DOI and download the underlying data all at once. Currently, there is support in this class for figshare, Zenodo, and Dataverse instances. I think adding support for the NIST PDR could do a lot for interoperability.
One use case internally: our package ETSpy has a few datasets included for testing and demonstration that are currently distributed with the package (not ideal, as it bloats the size of the package). The common way of dealing with this is to host the files in a repo somewhere and then use pooch to fetch them on demand as-needed and cache for later use. Most commonly, Zenodo is used for this, but since it's a NIST project, it would be preferred (required?) to host those in the PDR. Being able to easily use pooch for that with a DOI would be great.
Assuming the pooch team is open to it, I may have some cycles to work on this interoperability bit if it's of interest to the team.
Hi @RayPlante, not sure if this is the best place to put this, but I was hoping the team might consider a small effort to implement/work with the
pooch
project to support the programmatic downloading of data from data.nist.gov. If you're not familiar,pooch
has become a pretty widely-used tool in the scientific Python community for downloading datasets and other web resources, with tools for built in caching and some other nifty tricks.The coolest part (to me) is the
DOIDownloader
class that allows you to saypooch.retrieve("doi:10.6084/m9.figshare.14763051.v1/tiny-data.txt")
, and it will parse the DOI and download the underlying data all at once. Currently, there is support in this class for figshare, Zenodo, and Dataverse instances. I think adding support for the NIST PDR could do a lot for interoperability.One use case internally: our package ETSpy has a few datasets included for testing and demonstration that are currently distributed with the package (not ideal, as it bloats the size of the package). The common way of dealing with this is to host the files in a repo somewhere and then use
pooch
to fetch them on demand as-needed and cache for later use. Most commonly, Zenodo is used for this, but since it's a NIST project, it would be preferred (required?) to host those in the PDR. Being able to easily use pooch for that with a DOI would be great.Assuming the pooch team is open to it, I may have some cycles to work on this interoperability bit if it's of interest to the team.