spencerahill / aospy

Python package for automated analysis and management of gridded climate data
Apache License 2.0
82 stars 11 forks source link

Potentially use intake for describing/finding data on disk (i.e. what DataLoaders do) #318

Open spencerahill opened 5 years ago

spencerahill commented 5 years ago

Background on intake:

"Intake is a lightweight package for finding, investigating, loading and disseminating data. It will appeal to different groups for some of the reasons below, but is useful for all and acts as a common platform that everyone can use to smooth the progression of data from developers and providers to users."

And then there is the intake-cmip5 project.

Copying and pasting relevant discussion from #316 that spurred this issue:

could we (eventually) separate the logic of what the type of data store is (zarr vs. netcdf) from the description of how the files are organized? Then we could use composition to specify any combination, e.g. a NestedDictDataLoader that uses Zarr files vs. the same but that uses netCDF.

This sounds a lot like what intake does. You might get more mileage out of first refactoring around intake. Then you would be able to outsource all of the file loading stuff. The pangeo intake catalog for example contains both multi-netcdf file datasets and zarr datasets. The user doesn't ever have to care what the underlying driver is.

It's been on our radar for a while, but we didn't have a compelling reason to switch so far. But now that it's getting more and more adoption including through pangeo (including intake-cmip5...very cool!), perhaps that's no longer the case.

Ultimately, I still haven't even played around with intake much, so I don't have much more coherent to say for now. But if indeed we can offload a lot of what our DataLoaders do to something that is becoming industry standard so-to-speak, then that seems to me a good thing. @spencerkclark and I talked briefly about intake last month offline, and at least at that time we weren't feeling compelled to start using intake, but I must confess that I've forgotten the details.