openml / OpenML

Open Machine Learning
https://openml.org
BSD 3-Clause "New" or "Revised" License
664 stars 90 forks source link

OpenML API parquet migration - Phase 2 #1154

Open prabhant opened 2 years ago

prabhant commented 2 years ago

We already have dataset download support in parquet and MinIO, now the next phase is uploading these datasets.

We need to allow parquet upload directly to MinIO. For this there are 3 components which are needed to be changed:

@PGijsbers @joaquinvanschoren @janvanrijn

PGijsbers commented 2 years ago

I created https://github.com/openml/openml-python/issues/1141. Can you elaborate on the new sequence of communication for uploading the dataset from a client API? Are the new endpoints already available?

Assign the uploaded dataset ID and then transfer it to the MinIO.

Seems like the server will put the dataset in the MinIO bucket while

To convert dataset directly from dataframe to parquet and send an upload request.

makes it sound as though the client is expected to upload directly to the MinIO server.