openml / openml-python

OpenML's Python API for a World of Data and More 💫
http://openml.github.io/openml-python/
Other
279 stars 143 forks source link

Allow forcing a download even if cached files are already present #1132

Closed PGijsbers closed 1 year ago

PGijsbers commented 2 years ago

Sometimes cache has to be updated, and manually removing the cache directory to force a download is required. It would be nice to simply force a cache refresh from code.

PGijsbers commented 2 years ago

In the meanwhile, you can use this function (you can use the same arguments as get_dataset).

import openml

def force_get_dataset(dataset_id=None, *args, **kwargs):
    """ Remove any existing local files about `dataset_id` and then download new copies. """
    did_cache_dir = openml.utils._create_cache_directory_for_id(openml.datasets.functions.DATASETS_CACHE_DIR_NAME, dataset_id, )
    openml.utils._remove_cache_dir_for_id(openml.datasets.functions.DATASETS_CACHE_DIR_NAME, did_cache_dir)
    return openml.datasets.get_dataset(dataset_id, *args, **kwargs)

if __name__ == "__main__":
    force_get_dataset(61)
LennartPurucker commented 1 year ago

This was resolved by PR #1260 by adding an option to refresh the cache to get_dataset.