Open falknerdominik opened 4 years ago
Thanks for the bug report.
Could you please update to the latest tslearn version and let us know
if you still experience the bug ?
Dominik Falkner notifications@github.com a écrit :
I am using the
TimeSeriesKMeans
class to cluster simple time
series data. The data length is variable and a wanted to cluster it
first:# load data as pd.DataFrame data = get_ts(...) data = to_time_series_dataset(X.values) km = TimeSeriesKMeans(n_cluster=4, n_init=10, init='k-means++', metric='dtw') km.fit(data)
After running this i get the following error (same with other
metrics e.g. dtw):ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
- OS: Windows 10
- tslearn version: 0.3.1
When i resample the data using:
TimeSeriesResampler(sz=80)
it works.-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/tslearn-team/tslearn/issues/283
Thanks for the quick reply. Updated to version '0.4.1' and the problem still persists.
Could you please provide the full error message so that we can spot in which step the problem is happening?
Sure. I used **** to mask parts where the stacktrace contains project specific code I am not allowed to share.
Traceback (most recent call last):
File "****", line 193, in _run_module_as_main
"__main__", mod_spec)
File "****", line 85, in _run_code
exec(code, run_globals)
File "****", line 213, in <module>
run()
File "****\__main__.py", line 209, in run
cluster_with_sequences(****)
File "****\__main__.py", line 199, in cluster_with_sequences
****
File "****\__main__.py", line 97, in compute_and_evaluate_model
value = calc(data, estimator.labels_, **cvi.kwargs)
File "****\lib\site-packages\tslearn\clustering.py", line 237, in silhouette_score
**kwds)
File "****\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py", line 117, in silhouette_score
return np.mean(silhouette_samples(X, labels, metric=metric, **kwds))
File "****\lib\site-packages\sklearn\metrics\cluster\_unsupervised.py", line 213, in silhouette_samples
X, labels = check_X_y(X, labels, accept_sparse=['csc', 'csr'])
File "****\lib\site-packages\sklearn\utils\validation.py", line 755, in check_X_y
estimator=estimator)
File "****\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array
allow_nan=force_all_finite == 'allow-nan')
File "****\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite
msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
It seems the error occurs during a call to silhouette_score
.
If you could build a minimum working example, I could probably help further.
Romain Tavenard Maître de conférences / Assistant professor Univ. Rennes - LETG Tél. / Phone : +33 2 99 14 18 04 http://rtavenar.github.io/research/ http://rtavenar.github.io/research/
Le 25 août 2020 à 17:08, Dominik Falkner notifications@github.com a écrit :
Sure. I used **** to mask parts where the stacktrace contains project specific code I am not allowed to share.
Traceback (most recent call last): File "", line 193, in _run_module_as_main "main", mod_spec) File "", line 85, in _run_code exec(code, run_globals) File "", line 213, in
run() File " __main.py", line 209, in run cluster_with_sequences() File "\main__.py", line 199, in cluster_with_sequences
File "**__main__.py", line 97, in compute_and_evaluatemodel value = calc(data, estimator.labels, cvi.kwargs) File "**\lib\site-packages\tslearn\clustering.py", line 237, in silhouette_score kwds) File "**\lib\site-packages\sklearn\metrics\cluster_unsupervised.py", line 117, in silhouette_score return np.mean(silhouette_samples(X, labels, metric=metric, kwds)) File "\lib\site-packages\sklearn\metrics\cluster_unsupervised.py", line 213, in silhouette_samples X, labels = check_X_y(X, labels, accept_sparse=['csc', 'csr']) File "\lib\site-packages\sklearn\utils\validation.py", line 755, in check_X_y estimator=estimator) File "\lib\site-packages\sklearn\utils\validation.py", line 578, in check_array allow_nan=force_all_finite == 'allow-nan') File "\lib\site-packages\sklearn\utils\validation.py", line 60, in _assert_all_finite msg_dtype if msg_dtype is not None else X.dtype) ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tslearn-team/tslearn/issues/283#issuecomment-680084427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELAREZIUQT6MFSVJV4PMFLSCPHYXANCNFSM4QFYTHXQ.
Found a bug in my code.
Issue can be closed.
Found a bug in my code.
Issue can be closed.
Hi @falknerdominik , How did you fix the issue? Thanks.
Hi @Huanle, I calculated the silhouette score using the euclidean distance, which results in the error above because the time series did not have equal length. I was using a generic pipeline that started the process - so the stacktrace did not really help.
Maybe a throw a better warning @rtavenar when the silhouette score with euclidean distance is used?
Thanks @falknerdominik . this makes sense to me.
I am using the
TimeSeriesKMeans
class to cluster simple time series data. The data length is variable and a wanted to cluster it first:After running this i get the following error (same with other metrics e.g. softdtw):
When i resample the data using:
TimeSeriesResampler(sz=80)
it works.