Closed joaquinvanschoren closed 3 months ago
That bucket URL is outdated. MinIO URLs should be formatted https://openml1.win.tue.nl/datasets/X/ID/dataset_ID_croissant.json
where ID is the dataset ID (left padded with 0s to contain at least 4 digits), and X is floor(ID/10_000)
.
For example: https://openml1.win.tue.nl/datasets/0000/0061/dataset_61_croissant.json
However, it looks like this is also broken. Once I have a stable connection I could check the MinIO server to see if there are any croissant files at all, or if they have not yet been moved over.
edit: looks like I had the filename correct, my answer above is updated to reflect that.
wget https://openml1.win.tue.nl/datasets/0004/42980/dataset_42980_croissant.json
I edited my comment, does that address the concerns?
I think so, I did a PR in mlcommons/croissant
Some of our croissants cannot be accessed by the general public. A list of inaccessible croissants can be found here: https://github.com/mlcommons/croissant/blob/main/health/visualizer/report_openml.ipynb
Here's one example: https://openml1.win.tue.nl/dataset42980/dataset42980/croissant.json
This might be a permission error or maybe the croissant file doesn't exist.