opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
510 stars 177 forks source link

Deleting Products from ODC Index #177

Open hkristen opened 7 years ago

hkristen commented 7 years ago

I inserted a couple of S2A-L1C tiles with different ingestion parameters to see what works best. Now I have quite a few products in my databse, that I don´t need anymore. Is there a way to delete products from the AGDC?

I couldn´t find anything regarding this in the manual, maybe I overlooked it?

v0lat1le commented 7 years ago

datacube dataset archive command is the closest thing, probably. It marks datasets as archived so they don't show up in searches. After that it's loading up postgres console and deleting rows from the database.

v0lat1le commented 7 years ago

As a side note. You can also load the data without ingesting. Might be good enough and will save you some time and disk space

v0lat1le commented 7 years ago

@jeremyh should we add a purge/delete command?

jeremyh commented 7 years ago

Yes, I think so. We have only implemented archive so far because we want to keep a record of every historical dataset — primarily for provenance. Once archived, a dataset is essentially invisible from most typical commands.

But deletion sounds useful in development situations like this, so it's probably worth adding with administrator restrictions.

(patches are welcome, as it's probably not a priority for us in the extreme short term.)

hkristen commented 7 years ago

Thanks for your answers!

For the start I will use datacube dataset archive so that I get a better overview in my DB. Still the data is on the disk and uses up space, so a purge / delete command would be much appreciated but it is not urgent.

@v0lat1le: I am aware that I can access the data without ingesting it. But the point is, that I am trying to find ingestion paramters that provide a good balance between ingestion time and processing performance for my project.

JackLidge commented 5 years ago

Looking through the proposed changes for an eventual 2.0 release, I didn't notice anything about specifically adding in a "delete tiles" function into the datacube infrastructure, and was wondering if this is something that is being considered / added in for ODC2.0?

clausmichele commented 4 years ago

I can't find how to use datacube dataset archive to archive/hide a product. What is the correct syntax? I've tried with:

datacube dataset archive NAME_OF_PRODUCT_TO_ARCHIVE

Traceback (most recent call last):
  File "/home/miniconda3/envs/odc/bin/datacube", line 11, in <module>
    load_entry_point('datacube', 'console_scripts', 'datacube')()
  File "/home/miniconda3/envs/odc/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/miniconda3/envs/odc/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/miniconda3/envs/odc/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/miniconda3/envs/odc/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/miniconda3/envs/odc/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/miniconda3/envs/odc/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/datacube-core/datacube/ui/click.py", line 197, in new_func
    return f(parsed_config, *args, **kwargs)
  File "/home/datacube-core/datacube/ui/click.py", line 225, in with_index
    return f(index, *args, **kwargs)
  File "/home/datacube-core/datacube/scripts/dataset.py", line 437, in archive_cmd
    to_process = _get_derived_set(index, id_) if archive_derived else [index.datasets.get(id_)]
  File "/home/datacube-core/datacube/index/_datasets.py", line 76, in get
    id_ = UUID(id_)
  File "/home/miniconda3/envs/odc/lib/python3.6/uuid.py", line 140, in __init__
    raise ValueError('badly formed hexadecimal UUID string')
ValueError: badly formed hexadecimal UUID string

I've also tried the following with no luck:

datacube dataset archive "NAME_OF_PRODUCT_TO_ARCHIVE"
datacube product archive NAME_OF_PRODUCT_TO_ARCHIVE
datacube NAME_OF_PRODUCT_TO_ARCHIVE archive
JackLidge commented 4 years ago

The archive function is used more for archiving individual datasets from the datacube, rather than an entire product specification (at least this is my understanding of how the current tool works).

To archive a dataset from the datacube, you will need the UUID specified in it's yaml file, and it can then be archived using:

datacube dataset archive UUID

clausmichele commented 4 years ago

This issue was opened to clarify how to delete products, is there a way to do that now or not yet?

Anyway, thanks for the support.

clausmichele commented 4 years ago

Any plan to add this functionality? Deleting products from CLI would be very useful.

Kirill888 commented 4 years ago

Better support for modifying existing collections in place is planned for ODC 2.0. With 1.8.x datacube db is almost "append only" as far as tooling in the datacube library and command line utilities go. There is archive functionality, but that just marks datasets as "no longer available for dc.load", it does not remove any metadata from the database.

best we can offer so far is this sql script:

https://gist.github.com/omad/1ae3463a123f37a9acf37213bebfde86

it should continue to work for 1.7/1.8 databases, as we do not intend to make any changes to the internal layout of the database.