openml / openml-python

Python module to interface with OpenML
https://openml.github.io/openml-python/main/
Other
276 stars 142 forks source link

Dataset download progress bar #1333

Open joaquinvanschoren opened 4 months ago

joaquinvanschoren commented 4 months ago

Description

Downloading large datasets can take an unpredictable amount of time. It would be nice if a progress bar could be shown.

Steps/Code to Reproduce

openml_dataset = openml.datasets.getdataset(ID) X, y, , _ = openml_dataset.get_data()

Expected Results

A progress bar is shown (either in standard out or in a notebook)

Actual Results

Nothing is shown

Versions

All

PGijsbers commented 4 months ago

It seems that there is a progress parameter to the fget_object call when downloading from minio: https://github.com/minio/minio-py/blob/74e8e5200d4dbf48ae9bbd6a8a2f54614e1958b3/minio/api.py#L1066 it's not in the main doc pages for whatever reason. I think it would be reasonable to only implement this for downloads from minio as otherwise files should generally be small (especially once we drop arff).

PGijsbers commented 3 months ago

This is implemented in #1335, which can hopefully be included in 0.15, whenever that may be.