opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
504 stars 176 forks source link

Adding a `dc.save()` feature #467

Closed omad closed 3 years ago

omad commented 6 years ago

Use Case

When experimenting with data loading from a Data Cube users need to be able to save the xarray.Dataset objects back into an Index, for use in future analysis.

Starting Point

@petewa has an initial implementation available in the csiro/execution-engine branch with an API that looks like:


    ds = DatacubeSave(dc)
    ds.save(nbar, 'my_bucket', 's3aio', 'dcsave_mydata', 'eo',
            chunking = {'time': 1, 'x': 3, 'y': 3})
    ds.save(nbar, '/home/ubuntu/data/output', 's3aio_test', 'dcsave_mydata', 'eo',
            chunking = {'time': 1, 'x': 3, 'y': 3})
    ds.save(nbar, '/home/ubuntu/data/output', 'NetCDF CF', 'dcsave_mydata', 'eo',
            chunking = {'time': 1, 'x': 4, 'y': 4})

This is a good starting point for implementing a simple save function.

In discussion with @Kirill888 there's a few additions and changes we would propose:

Potential Problems

Kirill888 commented 6 years ago

Other potential area of concern is dealing with "lazy datasets", i.e. dask arrays. Potentially some kind of tiling might be required, essentially same thing ingest does. Once this feature is available people will want to apply it to entire DB, we will need to make it easy. Put actual "workflow" into GridWorkflow class. Doesn't mean we have to address right away, but it will be requested next I can guarantee it.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.