tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.92k stars 342 forks source link

Fix UCR_UEA_datasets #517

Closed YannCabanes closed 5 months ago

YannCabanes commented 5 months ago

Using Google Colab, after installing tslearn running !pip install tslearn, I can run:

from tslearn.datasets import UCR_UEA_datasets

X_train, y_train, X_test, y_test = UCR_UEA_datasets().load_dataset("TwoPatterns")

print(X_train.shape)  # (1000, 128, 1)
print(y_train.shape)  # (1000,)

ucr_uea_list_univariate_datasets = UCR_UEA_datasets().list_univariate_datasets()
print(len(ucr_uea_list_univariate_datasets))  # 0

ucr_uea_list_multivariate_datasets = UCR_UEA_datasets().list_multivariate_datasets()
print(len(ucr_uea_list_multivariate_datasets))  # 30

It seems that the univariate dataset list is empty, but it is still possible to import the datasets using their names.

This PR is related to the issue https://github.com/tslearn-team/tslearn/issues/516.

YannCabanes commented 5 months ago

The continuous integration tests of the main branch are failing:

=========================== short test summary info ============================
FAILED tslearn/datasets/ucr_uea.py::tslearn.datasets.ucr_uea.UCR_UEA_datasets.baseline_accuracy
FAILED tslearn/datasets/ucr_uea.py::tslearn.datasets.ucr_uea.UCR_UEA_datasets.list_datasets
FAILED tslearn/datasets/ucr_uea.py::tslearn.datasets.ucr_uea.UCR_UEA_datasets.list_univariate_datasets
===== 3 failed, 179 passed, 1 skipped, 129 warnings in 2545.27s (0:42:25) ======
=================================== FAILURES ===================================
____ [doctest] tslearn.datasets.ucr_uea.UCR_UEA_datasets.baseline_accuracy _____
103             are themselves dictionaries that provide accuracy scores for the
104             requested methods.
105 
106         Examples
107         --------
108         >>> uea_ucr = UCR_UEA_datasets()
109         >>> dict_acc = uea_ucr.baseline_accuracy(
110         ...         list_datasets=["Adiac", "ChlorineConcentration"],
111         ...         list_methods=["C45"])
112         >>> len(dict_acc)
Expected:
    2
Got:
    0
/home/vsts/work/1/s/tslearn/datasets/ucr_uea.py:112: DocTestFailure
______ [doctest] tslearn.datasets.ucr_uea.UCR_UEA_datasets.list_datasets _______
175 List datasets (both univariate and multivariate) available in the 
176         UCR/UEA archive.
177 
178         Examples
179         --------
180         >>> l = UCR_UEA_datasets().list_datasets()
181         >>> "PenDigits" in l
182         True
183         >>> "BeetleFly" in l
Expected:
    True
Got:
    False
/home/vsts/work/1/s/tslearn/datasets/ucr_uea.py:183: DocTestFailure
_ [doctest] tslearn.datasets.ucr_uea.UCR_UEA_datasets.list_univariate_datasets _
135 List univariate datasets in the UCR/UEA archive.
136 
137         Examples
138         --------
139         >>> l = UCR_UEA_datasets().list_univariate_datasets()
140         >>> len(l)
Expected:
    85
Got:
    0
YannCabanes commented 5 months ago

One test is still failing:

=================================== FAILURES ===================================
____ [doctest] tslearn.datasets.ucr_uea.UCR_UEA_datasets.baseline_accuracy _____
106 
107         Examples
108         --------
109         >>> uea_ucr = UCR_UEA_datasets()
110         >>> dict_acc = uea_ucr.baseline_accuracy(
111         ...         list_datasets=["Adiac", "ChlorineConcentration"],
112         ...         list_methods=["C45"])
113         >>> len(dict_acc)
114         2
115         >>> dict_acc["Adiac"]  # doctest: +ELLIPSIS
Expected:
    {'C45': 0.542199...}
Got:
    {}
YannCabanes commented 5 months ago

The readthedocs test failed:

    Traceback (most recent call last):
      File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/517/docs/examples/neighbors/plot_sax_mindist_knn.py", line 96, in <module>
        X_train, y_train, X_test, y_test = data_loader.load_dataset(dataset)
      File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/tslearn-0.6.3-py3.8.egg/tslearn/datasets/ucr_uea.py", line 281, in load_dataset
        success = extract_from_zip_url(url, target_dir=full_path)
      File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/tslearn-0.6.3-py3.8.egg/tslearn/datasets/datasets.py", line 39, in extract_from_zip_url
        urlretrieve(url, local_zip_fname)
      File "/home/docs/.asdf/installs/python/3.8.18/lib/python3.8/urllib/request.py", line 286, in urlretrieve
        raise ContentTooShortError(
    urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 920840 out of 1025045 bytes>
pickling environment... failed

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/cmd/build.py", line 290, in build_main
    app.build(args.force_all, args.filenames)
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/application.py", line 351, in build
    self.builder.build_update()
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 290, in build_update
    self.build(to_build,
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 327, in build
    pickle.dump(self.env, f, pickle.HIGHEST_PROTOCOL)
_pickle.PicklingError: Can't pickle <class 'matplotlib_svg_scraper'>: attribute lookup matplotlib_svg_scraper on builtins failed
YannCabanes commented 5 months ago

There is still an error in the readthedocs test:

pickling environment... failed

Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/cmd/build.py", line 290, in build_main
    app.build(args.force_all, args.filenames)
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/application.py", line 351, in build
    self.builder.build_update()
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 290, in build_update
    self.build(to_build,
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/517/lib/python3.8/site-packages/sphinx/builders/__init__.py", line 327, in build
    pickle.dump(self.env, f, pickle.HIGHEST_PROTOCOL)
_pickle.PicklingError: Can't pickle <class 'matplotlib_svg_scraper'>: attribute lookup matplotlib_svg_scraper on builtins failed