Open bfocassio opened 3 years ago
Hi @bfocassio
This should definitely be investigated. I am already aware that our implementation induces a raise in memory usage when time series of different lengths are at stake (since we cast them to a single numpy array whose size is that of the longest time series), but there might be other issues.
If anyone has time to work on that, I think it would be highly valuable for tslearn.
Hello @rtavenar and @bfocassio
I have the same problem.
It looks like when I am using dtw
the memory explodes.
I have 300 univariate time series with close to 9000 observations and from my experiments I need more than 256Gb of RAM and maybe even more.
Best regards,
Same issue here. Large dataset, TimeSeriesKMeans and dtw.
Describe the bug I'm trying to perform the clustering of a large dataset of time series using the
TimeSeriesKMeans
anddtw
. However, the clustering is killed due to memory issues.Inspired by this post, I decided to track the memory consumption of the clustering. In the MWE, I'm using the
track_memory
decorator (here).The data itself uses more or less 0.6 MB. The trained model uses ~20 MB. However, the fitting process reaches a peak in memory larger than 500 MB. For my real dataset with ~400 MB, the training is impracticable.
Is there any workaround? Can someone help me to reduce this peak in memory usage?
The number of peaks is proportional to the number of Kmeans iterations. Using
euclidean
metric instead ofdtw
fixes the memory problem, but it would not be appropriate for my original dataset.To Reproduce
MWE:
Environment: