openml / openml-croissant

Converting dataset metadata from OpenML to Croissant format
0 stars 0 forks source link

Access to croissants denied 🥺 #18

Closed joaquinvanschoren closed 3 months ago

joaquinvanschoren commented 4 months ago

Some of our croissants cannot be accessed by the general public. A list of inaccessible croissants can be found here: https://github.com/mlcommons/croissant/blob/main/health/visualizer/report_openml.ipynb

Here's one example: https://openml1.win.tue.nl/dataset42980/dataset42980/croissant.json

This might be a permission error or maybe the croissant file doesn't exist.

PGijsbers commented 4 months ago

That bucket URL is outdated. MinIO URLs should be formatted https://openml1.win.tue.nl/datasets/X/ID/dataset_ID_croissant.json where ID is the dataset ID (left padded with 0s to contain at least 4 digits), and X is floor(ID/10_000). For example: https://openml1.win.tue.nl/datasets/0000/0061/dataset_61_croissant.json

However, it looks like this is also broken. Once I have a stable connection I could check the MinIO server to see if there are any croissant files at all, or if they have not yet been moved over.

edit: looks like I had the filename correct, my answer above is updated to reflect that.

wget https://openml1.win.tue.nl/datasets/0004/42980/dataset_42980_croissant.json

PGijsbers commented 4 months ago

I edited my comment, does that address the concerns?

joaquinvanschoren commented 3 months ago

I think so, I did a PR in mlcommons/croissant