huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
683 stars 76 forks source link

Delete obsolete Parquet and DuckDB files #2384

Open severo opened 7 months ago

severo commented 7 months ago

replaces https://github.com/huggingface/datasets-server/issues/1613 and https://github.com/huggingface/datasets-server/issues/980.

When we call delete_dataset(), we should remove all the parquet and duckdb files.

And maybe even the refs/convert/parquet branch altogether?

AndreaFrancis commented 5 months ago

We should also delete refs/convert/duckdb

severo commented 5 months ago

Note that, if the dataset has been deleted from the Hub, there is no branch to delete :)

severo commented 3 months ago

The only cases:

I think we can close.

severo commented 2 months ago

https://huggingface.co/datasets/Cnam-LMSSC/vibravox/discussions/4#66854a97118934e841dcf35c

Yes they are only updated when the viewer is updated, disabling the viewer didn't remove those files as it should be. We'll work on a mechanism to clean the branch when the Viewer is disabled