johannfaouzi / pyts

A Python package for time series classification
https://pyts.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.77k stars 164 forks source link

Issues getting WEASEL transform of just one single time series #27

Open robotdude17 opened 5 years ago

robotdude17 commented 5 years ago

I'm trying to get a WEASEL transform of just one single time series and am running into issues, see below.

Please advise.

Thanks.

from pyts.transformation import WEASEL

Parameters

n_samples, n_timestamps = 1, 100 n_classes = 1

Toy dataset

rng = np.random.RandomState(41) X = rng.randn(n_samples, n_timestamps) y = rng.randint(n_classes, size=n_samples)

WEASEL transformation

weasel = WEASEL(word_size = 2, n_bins = 2, window_sizes=[12, 36])

X_weasel = weasel.fit_transform(X, y).toarray()

X_weasel = weasel.fit_transform(X, y)

X_weasel = weasel.fit_transform(np.array(X), np.array(y)).toarray()

Visualize the transformation for the first time series

plt.figure(figsize=(12, 8)) vocabularylength = len(weasel.vocabulary) width = 0.3 plt.bar(np.arange(vocabulary_length) - width / 2, X_weasel[0], width=width, label='First time series') plt.xticks(np.arange(vocabularylength), np.vectorize(weasel.vocabulary.get)(np.arange(X_weasel[0].size)), fontsize=12, rotation=60) plt.yticks(np.arange(np.max(X_weasel[:2] + 1)), fontsize=12) plt.xlabel("Words", fontsize=18) plt.ylabel("Frequencies", fontsize=18) plt.title("WEASEL transformation", fontsize=20) plt.legend(loc='best') plt.show()


ValueError Traceback (most recent call last)

in 13 weasel = WEASEL(word_size = 2, n_bins = 2, window_sizes=[12, 36]) 14 # X_weasel = weasel.fit_transform(X, y).toarray() ---> 15 X_weasel = weasel.fit_transform(X, y) 16 # X_weasel = weasel.fit_transform(np.array(X), np.array(y)).toarray() 17 ~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/transformation/weasel.py in fit_transform(self, X, y) 258 ) 259 y_repeated = np.repeat(y, n_windows) --> 260 X_sfa = sfa.fit_transform(X_windowed, y_repeated) 261 262 X_word = np.asarray([''.join(X_sfa[i]) ~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/sfa.py in fit_transform(self, X, y) 157 ) 158 self._pipeline = Pipeline([('dft', dft), ('mcb', mcb)]) --> 159 X_sfa = self._pipeline.fit_transform(X, y) 160 self.support_ = self._pipeline.named_steps['dft'].support_ 161 self.bin_edges_ = self._pipeline.named_steps['mcb'].bin_edges_ ~/anaconda3/envs/tf36/lib/python3.6/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params) 391 return Xt 392 if hasattr(last_step, 'fit_transform'): --> 393 return last_step.fit_transform(Xt, y, **fit_params) 394 else: 395 return last_step.fit(Xt, y, **fit_params).transform(Xt) ~/anaconda3/envs/tf36/lib/python3.6/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params) 554 else: 555 # fit method of arity 2 (supervised transformation) --> 556 return self.fit(X, y, **fit_params).transform(X) 557 558 ~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in fit(self, X, y) 113 self._check_constant(X) 114 self.bin_edges_ = self._compute_bins( --> 115 X, y, n_timestamps, self.n_bins, self.strategy) 116 return self 117 ~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in _compute_bins(self, X, y, n_timestamps, n_bins, strategy) 207 ) 208 else: --> 209 bins_edges = self._entropy_bins(X, y, n_timestamps, n_bins) 210 return bins_edges 211 ~/anaconda3/envs/tf36/lib/python3.6/site-packages/pyts/approximation/mcb.py in _entropy_bins(self, X, y, n_timestamps, n_bins) 221 "The number of bins is too high for feature {0}. " 222 "Try with a smaller number of bins or remove " --> 223 "this feature.".format(i) 224 ) 225 bins[i] = threshold ValueError: The number of bins is too high for feature 0. Try with a smaller number of bins or remove this feature.
johannfaouzi commented 5 years ago

Hi,

The WEASEL transformation is not suited for one single time series: it uses a binning procedure, and binning is pointless when there is one single data point. You need more samples to make it work.

Here is the paper describing the Symbolic Fourier Approximation, which is used in WEASEL. Figure 2 shows the binning process. It cannot work with a single time series.

I hope that it helps you.