openml / openml-python

Python module to interface with OpenML
https://openml.github.io/openml-python/main/
Other
279 stars 143 forks source link

Update downloading from MinIO to accommodate for new structure #1304

Closed PGijsbers closed 8 months ago

PGijsbers commented 9 months ago

Previously, each dataset had their own bucket: https://openml1.win.tue.nl/datasets61/dataset_61.pq

But we were advised to reduce the amount of buckets and favor hosting many objects in hierarchical structure, so we now have instead some prefixes to divide up the dataset objects into separate subdirectories: https://openml1.win.tue.nl/datasets/0000/0061/dataset_61.pq

I started work on it here: https://github.com/openml/openml-python/tree/fix/new_minio It works but is ugly and I didn't run any tests. Just trying to get it to work for now so Taniya and Prabhant can continue on with their deep learning integration, but we need to integrate this in the next release.

PGijsbers commented 9 months ago

Jos is currently working on bringing minio and parquet to the test server. I assume that will also have the "new style" buckets and object prefixes. @josvandervelde is this correct, and can you give a ping here when that is done?

@eddiebergman @LennartPurucker it would be great if one of you could pick this up when Jos has indicated there's minio/parquet on the test server.