hackalog / easydata

A flexible template for doing reproducible data science in Python.
MIT License
105 stars 22 forks source link

Give datasets multiple paths #241

Open acwooding opened 1 year ago

acwooding commented 1 year ago

Specify a list of paths for where to look for a dataset as part of a Dataset. Accept local and remote paths.

Example user story:

I have a dataset sometimes locally, and stored in a bucket. It's big and slow to get from the remote, so if I have a local copy, I want to use that instead. But I don't always have a local copy and I want to be able to share the code/blow away the local copy to make space at any point in time. I don't want to write the "if local" logic every time I want to use the dataset in question.