openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
668 stars 91 forks source link

Point .pq files to datasets bucket #1203

Closed josvandervelde closed 10 months ago

josvandervelde commented 10 months ago

Goal

See https://github.com/orgs/openml/projects/5?pane=issue&itemId=46514884 for the goal & reasoning.

In this PR

The .pq url is changed from https://openml1.win.tue.nl/dataset45714/dataset_45714.pq to https://openml1.win.tue.nl/datasets/0004/45714/dataset_45714.pq

How to test

I haven't tested this thoroughly. I just ran

$MINIO_URL = 'http://openml1.win.tue.nl/';
$data_id = 45714;
$bracket = sprintf('%04d', floor($data_id / 10000));
$padded_id = sprintf('%04d', $data_id);
$url = $MINIO_URL . 'datasets/' . $bracket . '/' . $padded_id . '/dataset_' . $data_id . '.pq';
echo($url);

In a php-sandbox and checked the resulting url.

josvandervelde commented 10 months ago

Hi @joaquinvanschoren could you review this PR, and if it's OK, could you deploy it / tell me how to deploy it?

joaquinvanschoren commented 10 months ago

Done, also deployed.