Suggestion: Add caching to package or saving/loading code to examples.

uci-ml-repo / ucimlrepo

Python package for dataset imports from UCI ML Repository

MIT License

199 stars 80 forks source link

from ucimlrepo import fetch_ucirepo import pickle import os dataset_id = 2 fname = f"id_{dataset_id}.pkl" if os.path.isfile(fname): with open(fname, "rb") as f: data = pickle.load(f) else: data = fetch_ucirepo(id=dataset_id) with open(fname, "wb") as f: pickle.dump(data, f)

I was just looking into exactly this and it turns out you cannot pickle the dotdicts the ucimlrepo uses. At least for me it fails with a strange error:

python3 test.py 
Traceback (most recent call last):
  File "/home/rpaul/proj/bnn-benchmark/src/test.py", line 10, in <module>
    data = pickle.load(f)
TypeError: 'NoneType' object is not callable

which however can be googled and leads to this SO: https://stackoverflow.com/a/2050357. Adding the required methods to the dotdict resolves the issue for me. I opened a pull request for the change. It doesn't yet cache the downloaded data, but at least it allows you to implement caching manually.

uci-ml-repo / ucimlrepo

Suggestion: Add caching to package or saving/loading code to examples. #14