Question about the input to compute SymbolicAggregateApproximation

ivan-marroquin commented 1 year ago

Hi,

Thanks for making this great package available!

The input data is expected to have "n samples" x "n time stamps" and be univariate time series. If I have only one time series, and I used the SymbolicAggregateApproximation as follows:

a) First scenario

X= np.array([0, 4, 2, 1, 7, 6, 3, 5]).reshape(-1,1) 
transformer = SymbolicAggregateApproximation()
print(transformer.transform(X))

I get this result:
home/ivan_phd/python_3.9.0/lib/python3.9/site-packages/pyts/preprocessing/discretizer.py:168: UserWarning: Some quantiles are equal. The number of bins will be smaller for sample [0 1]. Consider decreasing the number of bins or removing these samples.
  warn("Some quantiles are equal. The number of bins will "
[['a']
 ['a']
 ['a']
 ['a']
 ['a']
 ['a']
 ['a']
 ['a']]

b) Second scenario

X= np.array([[0, 4, 2, 1, 7, 6, 3, 5], [0, 4, 2, 1, 7, 6, 3, 5]])
transformer = SymbolicAggregateApproximation()
print(transformer.transform(X))

I get this result:
[['a' 'c' 'b' 'a' 'd' 'd' 'b' 'c']
 ['a' 'c' 'b' 'a' 'd' 'd' 'b' 'c']]

My questions are:

Do I need to duplicate the time series to get the expected result?
What is the meaning of 'n time stamps' for input data?

Thanks,

Ivan

johannfaouzi commented 1 year ago

Hi,

What you want is scenario A but the reshaping is wrong: if you have only one time series (i.e., one sample), you need to reshape your 1D array as a 2D array with one row: X = np.array([0, 4, 2, 1, 7, 6, 3, 5]).reshape(1, -1)
n_timestamps is the number of time points (values) in each time series. In your example, your time series has 8 values (n_timestamps=8)

This convention is used because one needs a set of samples (and not just one sample) to perform machine learning, which is why the input is assumed to be a set of univariate time series (2D array).

Hope this helps you a bit and do not hesitate to ask more questions if needed.

Best, Johann

ivan-marroquin commented 1 year ago

Hi @johannfaouzi

Thanks for your quick response. Ivan

johannfaouzi / pyts

Question about the input to compute SymbolicAggregateApproximation #147