johannfaouzi / pyts

A Python package for time series classification
https://pyts.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.76k stars 163 forks source link

Question about the input to compute SymbolicAggregateApproximation #147

Closed ivan-marroquin closed 1 year ago

ivan-marroquin commented 1 year ago

Hi,

Thanks for making this great package available!

The input data is expected to have "n samples" x "n time stamps" and be univariate time series. If I have only one time series, and I used the SymbolicAggregateApproximation as follows:

a) First scenario

X= np.array([0, 4, 2, 1, 7, 6, 3, 5]).reshape(-1,1) 
transformer = SymbolicAggregateApproximation()
print(transformer.transform(X))

I get this result:
home/ivan_phd/python_3.9.0/lib/python3.9/site-packages/pyts/preprocessing/discretizer.py:168: UserWarning: Some quantiles are equal. The number of bins will be smaller for sample [0 1]. Consider decreasing the number of bins or removing these samples.
  warn("Some quantiles are equal. The number of bins will "
[['a']
 ['a']
 ['a']
 ['a']
 ['a']
 ['a']
 ['a']
 ['a']]

b) Second scenario

X= np.array([[0, 4, 2, 1, 7, 6, 3, 5], [0, 4, 2, 1, 7, 6, 3, 5]])
transformer = SymbolicAggregateApproximation()
print(transformer.transform(X))

I get this result:
[['a' 'c' 'b' 'a' 'd' 'd' 'b' 'c']
 ['a' 'c' 'b' 'a' 'd' 'd' 'b' 'c']]

My questions are:

Thanks,

Ivan

johannfaouzi commented 1 year ago

Hi,

This convention is used because one needs a set of samples (and not just one sample) to perform machine learning, which is why the input is assumed to be a set of univariate time series (2D array).

Hope this helps you a bit and do not hesitate to ask more questions if needed.

Best, Johann

ivan-marroquin commented 1 year ago

Hi @johannfaouzi

Thanks for your quick response. Ivan