tslearn-team / tslearn

The machine learning toolkit for time series analysis in Python
https://tslearn.readthedocs.io
BSD 2-Clause "Simplified" License
2.92k stars 342 forks source link

[WIP] Add PyTorch backend for soft-DTW #431

Closed YannCabanes closed 1 year ago

YannCabanes commented 2 years ago

This PR plans to make compatible the files soft_dtw_fast.py and softdtw_variants.py with the PyTorch backend.

We take inspiration from the following GitHub repository: https://github.com/Sleepwalking/pytorch-softdtw/blob/master/soft_dtw.py

Github repository of Mehran Maghoumi on Soft-DTW using Pytorch with cuda: https://github.com/Maghoumi/pytorch-softdtw-cuda/blob/master/soft_dtw_cuda.py

An introduction to Dynamic Time Warping can be found at: https://rtavenar.github.io/blog/dtw.html

An introduction about the differentiability of DTW and the case of soft-DTW can be found at: https://rtavenar.github.io/blog/softdtw.html

We also take inspiration from the python package geomstats [JMLR:v21:19-027] (https://github.com/geomstats/geomstats/) about ML in Riemannian manifolds as a source of inspiration to implement the multiple backends functions.

References

[JMLR:v21:19-027] Nina Miolane, Nicolas Guigui, Alice Le Brigant, Johan Mathe, Benjamin Hou, Yann Thanwerdas, Stefan Heyder, Olivier Peltre, Niklas Koep, Hadi Zaatiti, Hatem Hajri, Yann Cabanes, Thomas Gerald, Paul Chauchat, Christian Shewmake, Daniel Brooks, Bernhard Kainz, Claire Donnat, Susan Holmes and Xavier Pennec. Geomstats: A Python Package for Riemannian Geometry in Machine Learning, Journal of Machine Learning Research, 2020, volume 21, number 223, pages 1-9, http://jmlr.org/papers/v21/19-027.html

YannCabanes commented 2 years ago

All checks of tslearn main branch have passed.

codecov-commenter commented 2 years ago

Codecov Report

Patch coverage: 90.41% and project coverage change: -1.52 :warning:

Comparison is base (a091483) 94.53% compared to head (fef2486) 93.02%.

:exclamation: Current head fef2486 differs from pull request most recent head 5f90b87. Consider uploading reports for the commit 5f90b87 to get more accurate results

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #431 +/- ## ========================================== - Coverage 94.53% 93.02% -1.52% ========================================== Files 62 67 +5 Lines 4743 5663 +920 ========================================== + Hits 4484 5268 +784 - Misses 259 395 +136 ``` | [Impacted Files](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team) | Coverage Δ | | |---|---|---| | [tslearn/metrics/sax.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9tZXRyaWNzL3NheC5weQ==) | `100.00% <ø> (ø)` | | | [tslearn/metrics/soft\_dtw\_fast.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9tZXRyaWNzL3NvZnRfZHR3X2Zhc3QucHk=) | `57.39% <30.98%> (-42.61%)` | :arrow_down: | | [tslearn/backend/pytorch\_backend.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9iYWNrZW5kL3B5dG9yY2hfYmFja2VuZC5weQ==) | `77.20% <77.20%> (ø)` | | | [tslearn/metrics/softdtw\_variants.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9tZXRyaWNzL3NvZnRkdHdfdmFyaWFudHMucHk=) | `93.33% <90.90%> (-4.51%)` | :arrow_down: | | [tslearn/clustering/kmeans.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9jbHVzdGVyaW5nL2ttZWFucy5weQ==) | `91.69% <91.66%> (ø)` | | | [tslearn/metrics/soft\_dtw\_loss\_pytorch.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9tZXRyaWNzL3NvZnRfZHR3X2xvc3NfcHl0b3JjaC5weQ==) | `92.64% <92.64%> (ø)` | | | [tslearn/backend/backend.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9iYWNrZW5kL2JhY2tlbmQucHk=) | `93.54% <93.54%> (ø)` | | | [tslearn/backend/numpy\_backend.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9iYWNrZW5kL251bXB5X2JhY2tlbmQucHk=) | `93.93% <93.93%> (ø)` | | | [tslearn/utils/utils.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi91dGlscy91dGlscy5weQ==) | `93.85% <95.23%> (+0.17%)` | :arrow_up: | | [tslearn/metrics/dtw\_variants.py](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team#diff-dHNsZWFybi9tZXRyaWNzL2R0d192YXJpYW50cy5weQ==) | `95.48% <96.76%> (-1.63%)` | :arrow_down: | | ... and [5 more](https://app.codecov.io/gh/tslearn-team/tslearn/pull/431?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=tslearn-team) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

kushalkolar commented 2 years ago

Do you think that in general tslearn would benefit from having torch versions for metrics or models? DTW based metrics would certainly be a useful loss function to utilize in pytorch for time-series models. The modules you've written so far seem useful for implementing metrics beyond just sDTW.

YannCabanes commented 1 year ago

Hello @kushalkolar, Thank you for your interest. You are right, having the PyTorch backend for the soft-DTW is only a beginning. In the end, @rtavenar and I plan to have all of tslearn tools supported by NumPy, PyTorch, TensorFlow and then maybe Jax and additional backends. We planned to start with only the PyTorch backend for the soft-DTW metrics, however it is already complicated because of the Numba @njitdecorator which only supports the NumPy backend.

YannCabanes commented 1 year ago

I am looking for ideas to convert the PyTorch input arrays to NumPy before using Numba: https://stackoverflow.com/questions/63169760/how-can-i-use-numba-for-pytorch-tensors

YannCabanes commented 1 year ago

My problem is the following: Numba only supports NumPy arrays. I have created a decorator which converts inputs of a function to Numpy and then convert back the output to the original backend so I can write:

@convert_backend_to_numpy()
@njit(fastmath=True)
def function(args):
    ...

Then I can call this function using either a NumPy or a PyTorch input. The output has the same type than the input.

kushalkolar commented 1 year ago

Some quick thoughts: wouldn't it be better to make a pure torch implementation, at least for the most compute intense steps, than have lots back and forth with numpy? sDTW could be a great loss function for some use cases and I wonder if constant back and forth between numpy and torch Tensors (or jax etc. Tensors) could slow things down.

YannCabanes commented 1 year ago

However, the @convert_backend_to_numpy() decorator does not work when a njit decorated function is calling another auxiliary function (for example _soft_dtw calling _softmin3) . If a main function is decorated with jit, then its auxiliary functions also have to be decorated with jit and the convert_backend_to_numpy decorator can not precede the jit decorator of the auxiliary functions.

I could remove all convert_backend_to_numpy decorators from the auxiliary functions but then they will only support the NumPy backend.

@rtavenar, do you know about how we could do for the auxiliary jit decorated function to support the PyTorch backend? Or should the auxiliary functions only support the NumPy backend?

YannCabanes commented 1 year ago

Hello @kushalkolar, Yes, I am also comparing the execution times between:

YannCabanes commented 1 year ago

Even if I choose to use directly the PyTorch backend, I still have a problem related to Numba. I have created a decorator called @njit_if_numpy_backend which decorates a function only if the inputs use the NumPy backend. However, I have a problem when a main jit decorated function is calling an auxiliary function. If the main function is using the NumPy backend, then the auxiliary function also has to be decorated with jit. But if the PyTorch backend is used in the main function, then the auxiliary function can not be jit decorated.

YannCabanes commented 1 year ago

Numba does not support a backend as input argument of a njit decorated function as a backend does not correspond to a Numba type. We obtain the following error: Cannot determine Numba type of <class 'tslearn.backend.backend.Backend'>

Likewise, we can not write in a njit decorated function:

if backend is None:
    backend = Backend(data)

as Backend does not have a Numba type. Even if backend is not None in the execution, this will raise an Error during the compilation.

rtavenar commented 1 year ago

Maybe you can just (at least for a start) have constant integer values that identify the backends. Would that help, or is the problem due to calling the Backend builder?

YannCabanes commented 1 year ago

Hello @rtavenar, the problem is due to calling the Backend builder as it does not have a Numba type. Elements which can be used in a Numba njit decorated function are mainly:

YannCabanes commented 1 year ago

Adding torch in the file docs/requirements_rtd.txt fixed the docs/readthedocs.org:tslearn failing test.

YannCabanes commented 1 year ago

All tests pass except codecov which was Cancelled after 60m

YannCabanes commented 1 year ago

After discussion with @rtavenar, here is a list of modifications to do:

YannCabanes commented 1 year ago

All checks have passed 15 successful checks

YannCabanes commented 1 year ago

The test macOS Python39 has failed. tslearn/tests/test_metrics.py::test_gamma_soft_dtw Fatal Python error: Segmentation fault Bash exited with code '139'.

YannCabanes commented 1 year ago

The test macOS Python39 has failed. tslearn/tests/test_metrics.py::test_gamma_soft_dtw Fatal Python error: Segmentation fault Bash exited with code '139'.

YannCabanes commented 1 year ago

I save here the execution time of the gradient descent using the Soft-DTW metric proposed in the notebook.

The objective is therefore to accelerate the computation of the gradients using PyTorch. For that:

YannCabanes commented 1 year ago

Note about the Github repository of Mehran Maghoumi on Soft-DTW using Pytorch with cuda: https://github.com/Maghoumi/pytorch-softdtw-cuda/blob/master/soft_dtw_cuda.py

In this Github repository, cuda is used to accelerate the computation of the Soft-DTW, the backend used is the PyTorch backend. However, the gradients are lost when cuda is used. Indeed:

from numba import cuda
a = torch.tensor([1.0], dtype=float, device=device, requires_grad=True)
b = cuda.as_cuda_array(a)
RuntimeError                              Traceback (most recent call last)

      1 a = torch.tensor([1.0], dtype=float, device=device, requires_grad=True)
----> 2 b = cuda.as_cuda_array(a)

[/usr/local/lib/python3.10/dist-packages/torch/_tensor.py] in __cuda_array_interface__(self)
   1031         # RuntimeError, matching tensor.__array__() behavior.
   1032         if self.requires_grad:
-> 1033             raise RuntimeError(
   1034                 "Can't get __cuda_array_interface__ on Variable that requires grad. "
   1035                 "If gradients aren't required, use var.detach() to get Variable that doesn't require grad."

RuntimeError: Can't get __cuda_array_interface__ on Variable that requires grad. If gradients aren't required, use var.detach() to get Variable that doesn't require grad.
YannCabanes commented 1 year ago

All tests are successful except macOS Python39. The error message is the following: tslearn/tests/test_metrics.py::test_symmetric_cdist Fatal Python error: Segmentation fault `/Users/runner/work/_temp/e68bb9d8-6b6a-4b02-bb87-ab085285360f.sh: line 5: 6817 Segmentation fault: 11 python -m pytest -v tslearn/ --doctest-modules -k 'not test_all_estimators'

[error]Bash exited with code '139'.

`

The only warning obtained on my local Linux computer when testing tslearn/tests/test_metrics.py is: `:219

:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 232 from PyObject `
YannCabanes commented 1 year ago

Now all 15 checks have failed. For MacOS Python39, the error message is the following: tslearn/tests/test_metrics.py::test_dtw_path_from_metric Fatal Python error: Segmentation fault `/Users/runner/work/_temp/ed131a23-2ff0-45ca-b23d-25fdd3e775e4.sh: line 5: 9657 Segmentation fault: 11 python -m pytest -v tslearn/ --doctest-modules -k 'not test_all_estimators'

[error]Bash exited with code '139'.`

YannCabanes commented 1 year ago

All 15 checks have failed. For MacOS Python39, the error message is the following: tslearn/tests/test_metrics.py::test_symmetric_cdist Fatal Python error: Segmentation fault `/Users/runner/work/_temp/c8bfff3d-1ce8-46f0-9752-037d99a1c63b.sh: line 5: 31115 Segmentation fault: 11 python -m pytest -v tslearn/ --doctest-modules -k 'not test_all_estimators'

[error]Bash exited with code '139'.

`

YannCabanes commented 1 year ago

All 15 checks have failed. For MacOS Python39, the error message is the following: tslearn/tests/test_metrics.py::test_softdtw Fatal Python error: Segmentation fault /Users/runner/work/_temp/c8bfff3d-1ce8-46f0-9752-037d99a1c63b.sh: line 5: 31115 Segmentation fault: 11 python -m pytest -v tslearn/ --doctest-modules -k 'not test_all_estimators' ##[error]Bash exited with code '139'.

YannCabanes commented 1 year ago

Error in the commit 1858797 name: the function test_softdtw of the file tslearn/tests/test_metrics.py has been commented.

YannCabanes commented 1 year ago

Here is the error message of the test docs/readthedocs.org:tslearn:

`Extension error: Here is a summary of the problems encountered when running the examples

Unexpected failing examples: /home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/456/docs/examples/neighbors/plot_sax_mindist_knn.py failed leaving traceback: Traceback (most recent call last): File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/456/docs/examples/neighbors/plot_sax_mindist_knn.py", line 99, in X_train = ts_scaler.fit_transform(X_train) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped data_to_wrap = f(self, X, *args, **kwargs) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/preprocessing/preprocessing.py", line 270, in fit_transform return self.fit(X).transform(X) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/preprocessing/preprocessing.py", line 252, in fit X = check_array(X, allow_nd=True, force_all_finite=False) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/sklearn/utils/validation.py", line 894, in check_array raise ValueError( ValueError: Expected 2D array, got scalar array instead: array=None. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/456/docs/examples/classification/plot_early_classification.py failed leaving traceback: Traceback (most recent call last): File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/456/docs/examples/classification/plot_early_classification.py", line 48, in X_train = TimeSeriesScalerMeanVariance().fit_transform(X_train) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped data_to_wrap = f(self, X, *args, **kwargs) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/preprocessing/preprocessing.py", line 270, in fit_transform return self.fit(X).transform(X) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/preprocessing/preprocessing.py", line 252, in fit X = check_array(X, allow_nd=True, force_all_finite=False) File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/456/lib/python3.8/site-packages/sklearn/utils/validation.py", line 894, in check_array raise ValueError( ValueError: Expected 2D array, got scalar array instead: array=None. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.`

YannCabanes commented 1 year ago

All 15 checks have failed. For MacOS Python39, the error message is the following: tslearn/tests/test_metrics.py::test_gamma_soft_dtw Fatal Python error: Segmentation fault /Users/runner/work/_temp/c8bfff3d-1ce8-46f0-9752-037d99a1c63b.sh: line 5: 31115 Segmentation fault: 11 python -m pytest -v tslearn/ --doctest-modules -k 'not test_all_estimators' ##[error]Bash exited with code '139'.

YannCabanes commented 1 year ago

All 15 checks have failed. The tests of the file tslearn/tests/test_metrics.py do not fail for MacOS Python39 once the following tests are removed:

These 4 tests are failing only for MacOS Python39.

YannCabanes commented 1 year ago

The commit 826cf9f fix the error from url_uea.py downloading the dataset zip files by updating the url. This problem has been solved in the PR #456.

YannCabanes commented 1 year ago

All 15 checks have passed.

The tests following tests have been temporarily removed:

These 4 tests are failing only for MacOS Python39.

YannCabanes commented 1 year ago

All 15 checks have passed.

YannCabanes commented 1 year ago

For MacOS Python39, the test test_gamma_soft_dtw has passed and the test test_symmetric_cdist with the following error: tslearn/tests/test_metrics.py::test_symmetric_cdist Fatal Python error: Segmentation fault ##[error]Bash exited with code '139'.

YannCabanes commented 1 year ago

For MacOS Python37, 6 test errors are related to import pandas as pd in the file tslearn.utils.cast.

=================================== FAILURES ===================================
_______________ [doctest] tslearn.utils.cast.from_pyflux_dataset _______________
454 
455     Returns
456     -------
  File "<doctest tslearn.utils.cast.from_pyflux_dataset[0]>", line 1, in <module>
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/__init__.py", line 50, in <module>
    from pandas.core.api import (
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/core/api.py", line 48, in <module>
    from pandas.core.groupby import (
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.generic import (
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 73, in <module>
    from pandas.core.frame import DataFrame
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/core/frame.py", line 129, in <module>
    from pandas.core import (
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/core/generic.py", line 122, in <module>
    from pandas.core.describe import describe_ndframe
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/core/describe.py", line 39, in <module>
    from pandas.io.formats.format import format_percentiles
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/io/formats/format.py", line 99, in <module>
    from pandas.io.common import stringify_path
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/site-packages/pandas/io/common.py", line 4, in <module>
    import bz2
  File "/Users/runner/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/bz2.py", line 19, in <module>
    from _bz2 import BZ2Compressor, BZ2Decompressor
ModuleNotFoundError: No module named '_bz2'
/Users/runner/work/1/s/tslearn/utils/cast.py:463: UnexpectedException
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
------- generated Nunit xml file: /Users/runner/work/1/s/test-output.xml -------
=========================== short test summary info ============================
FAILED tslearn/utils/cast.py::tslearn.utils.cast.from_pyflux_dataset
FAILED tslearn/utils/cast.py::tslearn.utils.cast.from_sktime_dataset
FAILED tslearn/utils/cast.py::tslearn.utils.cast.from_tsfresh_dataset
FAILED tslearn/utils/cast.py::tslearn.utils.cast.to_pyflux_dataset
FAILED tslearn/utils/cast.py::tslearn.utils.cast.to_sktime_dataset
FAILED tslearn/utils/cast.py::tslearn.utils.cast.to_tsfresh_dataset
= 6 failed, 156 passed, 2 skipped, 21 deselected, 128 warnings in 883.87s (0:14:43) =
##[error]Bash exited with code '1'.
Async Command Start: Publish test results
Publishing test results to test run '18091'.
TestResults To Publish 164, Test run id:18091
Test results publishing 164, remaining: 0. Test run id: 18091
Published Test Run : https://dev.azure.com/romaintavenard/tslearn/_TestManagement/Runs?runId=18091&_a=runCharts
Flaky failed test results are opted out of pass percentage
Async Command End: Publish test results
Finishing: Test
YannCabanes commented 1 year ago

Linux Python310 also failed because of:

`=================================== FAILURES =================================== __ [doctest] tslearn.datasets.ucr_uea.UCR_UEA_datasets.baseline_accuracy ___ 101 ------- 102 dict 103 A dictionary in which keys are dataset names and associated values 104 are themselves dictionaries that provide accuracy scores for the 105 requested methods. 106 107 Examples 108 -------- 109 >>> uea_ucr = UCR_UEA_datasets() 110 >>> dict_acc = uea_ucr.baseline_accuracy( UNEXPECTED EXCEPTION: TypeError('expected str, bytes or os.PathLike object, not NoneType') Traceback (most recent call last): File "/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/doctest.py", line 1350, in __run exec(compile(example.source, filename, "single", File "<doctest tslearn.datasets.ucr_uea.UCR_UEA_datasets.baseline_accuracy[1]>", line 1, in File "/home/vsts/work/1/s/tslearn/datasets/ucr_uea.py", line 121, in baseline_accuracy with open(self._baseline_scores_filename, "r") as f: TypeError: expected str, bytes or os.PathLike object, not NoneType /home/vsts/work/1/s/tslearn/datasets/ucr_uea.py:110: UnexpectedException =============================== warnings summary =============================== tslearn/datasets/ucr_uea.py::tslearn.datasets.ucr_uea.UCR_UEA_datasets.load_dataset /home/vsts/work/1/s/tslearn/datasets/datasets.py:49: RuntimeWarning: Corrupted or missing zip file encountered, aborting warnings.warn("Corrupted or missing zip file encountered, aborting",

tslearn/datasets/ucr_uea.py::tslearn.datasets.ucr_uea.UCR_UEA_datasets.load_dataset <doctest tslearn.datasets.ucr_uea.UCR_UEA_datasets.load_dataset[8]>:1: RuntimeWarning: dataset "DatasetThatDoesNotExist" could not be downloaded or extracted `

YannCabanes commented 1 year ago

I am now trying to add the possibility to use the Soft-DTW metric as a lost function in PyTorch neural networks. Here are useful links:

johannfaouzi commented 1 year ago

Regarding the failing build (macOS Python 3.9), it seems to be a segmentation fault 11 error (which is not extremely helpful, especially if the error only occurs for this build). The issue seems to come from the tslearn/tests/test_metrics.py::test_symmetric_cdist test, which runs 8 tests (2 backends and 4 functions), so it's hard to know which one is causing the error. Could you rewrite the test with parametrization?

There is also the choice of the backend for joblib. This is discussed in the documentation of the Parallel function, but I'm not familiar enough with it to know if it may impact the segmentation fault.

I was not able to reproduce this error locally.

YannCabanes commented 1 year ago

The only failing test is still MacOS with Python 3.9.

YannCabanes commented 1 year ago

Now the 3 MacOS continuous integration tests are failing. For MacOS 3.8 and 3.10, the error message is the following:

tslearn/tests/test_metrics.py::test_soft_dtw_loss_pytorch Fatal Python error: Aborted

Thread 0x0000700001fe7000 (most recent call first):
/Users/runner/work/_temp/2f308b01-9c93-4e88-adc9-5da2febcff76.sh: line 5:  2447 Abort trap: 6           python -m pytest -v tslearn/ --doctest-modules -k 'not test_all_estimators'
##[error]Bash exited with code '134'.
johannfaouzi commented 1 year ago

The issue is the use of parallel=True in some njit-functions: you cannot parallelize DTW because it's a recurrent function (you need the previous iteration to compute the next one). See Automatic parallelization with @jit:

Care should be taken, however, when reducing into slices or elements of an array if the elements specified by the slice or index are written to simultaneously by multiple parallel threads. The compiler may not detect such cases and then a race condition would occur.

However, with automatic parallelization, I'm not sure what kind of parallelization is done (I guess that is automatically detected, for better or worse). Moreover, for some reasons, the program crashing does not seeming consistent between different OS...

I was able to reproduce the issue locally and to fix it by removing parallel=True in the decorator of _njit_soft_dtw.

YannCabanes commented 1 year ago

Now only the test under MacOS for Python 3.9 is failing. The failing tests under MacOS for Python 3.8 and 3.10 have been solved by removing the optional parameter parallel=True in the decorator @njt of functions _njit_soft_dtw and _njit_soft_dtw_grad. This option was useless since there were no prange inside these functions.

YannCabanes commented 1 year ago

In the commit 237d356, the tests fail for MacOS Python 3.9 and for the codecov Python 3.10. For codecov, the error message is the following:

>       np.testing.assert_allclose(batch_ts_1.grad, expected_grad_ts1)
E       AssertionError: 
E       Not equal to tolerance rtol=1e-07, atol=0
E       
E       Mismatched elements: 280 / 400 (70%)
E       Max absolute difference: 6.95857845e-06
E       Max relative difference: 2.84667799e-05
E        x: array([[[-0.244439, -0.244439, -0.244439, -0.244439, -0.244439,
E                -0.244439, -0.244439, -0.244439],
E               [-0.244439, -0.244439, -0.244439, -0.244439, -0.244439,...
E        y: array([[[-0.244446, -0.244446, -0.244446, -0.244446, -0.244446,
E                -0.244446, -0.244446, -0.244446],
E               [-0.244446, -0.244446, -0.244446, -0.244446, -0.244446,...

tslearn/tests/test_metrics.py:679: AssertionError
YannCabanes commented 1 year ago

There are two failing tests now: MacOS Python 3.9 and Linux Python 3.10. The error message for Linux Python 3.10 is the following:

Starting: Test
==============================================================================
Task         : Command line
Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
Version      : 2.212.0
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
==============================================================================
Generating script.
========================== Starting Command Output ===========================
/usr/bin/bash --noprofile --norc /home/vsts/work/_temp/4b8f597b-7a8c-4c90-be0e-216a4ae65b71.sh
+ python -m pip install pytest pytest-azurepipelines
Collecting pytest
  WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f1d945cee60>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /packages/33/b2/741130cbcf2bbfa852ed95a60dc311c9e232c7ed25bac3d9b8880a8df4ae/pytest-7.4.0-py3-none-any.whl
  WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f1d945cf160>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /packages/33/b2/741130cbcf2bbfa852ed95a60dc311c9e232c7ed25bac3d9b8880a8df4ae/pytest-7.4.0-py3-none-any.whl
  WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f1d945cf340>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /packages/33/b2/741130cbcf2bbfa852ed95a60dc311c9e232c7ed25bac3d9b8880a8df4ae/pytest-7.4.0-py3-none-any.whl
  WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f1d945cf520>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /packages/33/b2/741130cbcf2bbfa852ed95a60dc311c9e232c7ed25bac3d9b8880a8df4ae/pytest-7.4.0-py3-none-any.whl
  WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f1d945cf640>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /packages/33/b2/741130cbcf2bbfa852ed95a60dc311c9e232c7ed25bac3d9b8880a8df4ae/pytest-7.4.0-py3-none-any.whl
ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Max retries exceeded with url: /packages/33/b2/741130cbcf2bbfa852ed95a60dc311c9e232c7ed25bac3d9b8880a8df4ae/pytest-7.4.0-py3-none-any.whl (Caused by NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x7f1d945cf940>: Failed to establish a new connection: [Errno 101] Network is unreachable'))

##[error]Bash exited with code '1'.
Finishing: Test
YannCabanes commented 1 year ago

The only failing is test is under MacOS with Python 3.9.

YannCabanes commented 1 year ago

The only failing test is under MacOS with Python 3.9. The documentation about docs/examples/autodiff/plot_soft_dtw_loss_for_pytorch_nn.py is displayed correctly.

YannCabanes commented 1 year ago

The only failing test is under MacOS with Python 3.9 in the commit cb38df3

YannCabanes commented 1 year ago

The only failing test is under MacOS with Python 3.9. The error message is:

tslearn/tests/test_metrics.py::test_gamma_soft_dtw PASSED                [ 61%]
tslearn/tests/test_metrics.py::test_symmetric_cdist Fatal Python error: Segmentation fault

Thread 0x0000000114283600 (most recent call first):
  File "/Users/runner/work/1/s/tslearn/metrics/utils.py", line 92 in _cdist_generic
  File "/Users/runner/work/1/s/tslearn/metrics/dtw_variants.py", line 1624 in cdist_dtw
  File "/Users/runner/work/1/s/tslearn/tests/test_metrics.py", line 476 in test_symmetric_cdist
  File "/Users/runner/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/Users/runner/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pluggy/_callers.py", line 80 in _multicall
  File "/Users/runner/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/pluggy/_manager.py", line 112 in _hookexec

...

File "/Users/runner/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/runpy.py", line 87 in _run_code
  File "/Users/runner/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/runpy.py", line 197 in _run_module_as_main
/Users/runner/work/_temp/c80d2a5c-8f34-4ed7-912d-142117c63a95.sh: line 5:  2242 Segmentation fault: 11  python -m pytest -v tslearn/ --doctest-modules -k 'not test_all_estimators'
##[error]Bash exited with code '139'.
Finishing: Test
YannCabanes commented 1 year ago

I will now investigate on the failing test under MacOS for Python 3.9

YannCabanes commented 1 year ago

Now even the docs/readthedocs.org:tslearn test is failing since last commit (6b3ea4b):

Extension error:
Here is a summary of the problems encountered when running the examples

Unexpected failing examples:
/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/431/docs/examples/classification/plot_shapelets.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/checkouts/431/docs/examples/classification/plot_shapelets.py", line 58, in <module>
    shp_clf.fit(X_train, y_train)
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/shapelets/shapelets.py", line 444, in fit
    self._set_model_layers(X=X, ts_sz=sz, d=d, n_classes=n_labels)
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/shapelets/shapelets.py", line 710, in _set_model_layers
    shapelet_layers = [
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/shapelets/shapelets.py", line 711, in <listcomp>
    LocalSquaredDistanceLayer(
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/shapelets/shapelets.py", line 135, in build
    self.kernel = self.add_weight(name='kernel',
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/shapelets/shapelets.py", line 106, in __call__
    shapelets = _kmeans_init_shapelets(self.X_,
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/shapelets/shapelets.py", line 89, in _kmeans_init_shapelets
    return TimeSeriesKMeans(n_clusters=n_shapelets,
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/clustering/kmeans.py", line 780, in fit
    self._fit_one_init(X_, x_squared_norms, rs)
  File "/home/docs/checkouts/readthedocs.org/user_builds/tslearn/envs/431/lib/python3.8/site-packages/tslearn-0.5.3.2-py3.8.egg/tslearn/clustering/kmeans.py", line 629, in _fit_one_init
    self.cluster_centers_ = _kmeans_plusplus(
TypeError: _kmeans_plusplus() missing 1 required positional argument: 'sample_weight'
YannCabanes commented 1 year ago

For the failing test under MacOS for Python 3.9, I have the following error message:

tslearn/tests/test_metrics.py ............Fatal Python error: Segmentation fault

Thread 0x0000000112858600 (most recent call first):
  File "/Users/runner/work/1/s/tslearn/metrics/dtw_variants.py", line 29 in _local_squared_dist
  File "/Users/runner/work/1/s/tslearn/metrics/dtw_variants.py", line 99 in accumulated_matrix
  File "/Users/runner/work/1/s/tslearn/metrics/dtw_variants.py", line 152 in _dtw
  File "/Users/runner/work/1/s/tslearn/metrics/dtw_variants.py", line 668 in dtw
  File "/Users/runner/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/joblib/parallel.py", line 1784 in _get_sequential_output
  File "/Users/runner/hostedtoolcache/Python/3.9.17/x64/lib/python3.9/site-packages/joblib/parallel.py", line 1855 in __call__
  File "/Users/runner/work/1/s/tslearn/metrics/utils.py", line 92 in _cdist_generic
  File "/Users/runner/work/1/s/tslearn/metrics/dtw_variants.py", line 1624 in cdist_dtw
  File "/Users/runner/work/1/s/tslearn/tests/test_metrics.py", line 476 in test_symmetric_cdist