Open thunderbug1 opened 3 years ago
Hi,
Sorry for the late reply. Support for variable-length data sets is unfortunately not supported for the moment.
Regarding WEASEL+MUSE, you can achieve this with the following process:
chi2_threshold
to a very low positive value in order to not perform feature selection)pandas
package is handy for this)The main downside of this approach is the high memory (RAM) usage because the feature selection is performed at the last step. A possible solution (that would lead to the same results) would be to use a for loop for the window_sizes
parameters (instead of setting a list with k
window sizes, you create a for loop (on the window sizes) and provide a single window size inside the for loop).
Here is an example (without the aforementioned optimization, I can modify the example to show you if needed):
import numpy as np
import matplotlib.pyplot as plt
from pyts.datasets import load_basic_motions
from pyts.multivariate.transformation import WEASELMUSE
import pandas as pd
from sklearn.feature_selection import chi2
#######################
####### D A T A #######
#######################
# Toy dataset
X_train, X_test, y_train, y_test = load_basic_motions(return_X_y=True)
# X_train.shape = X_test.shape = (40, 6, 100)
# Sample 4 random lengths between in the interval [80, 100]
rng = np.random.RandomState(42)
lengths = 80 + rng.choice(21, size=4, replace=False)
# Assign 10 time series to each length
lengths_samples_train_idx = rng.permutation(40).reshape((4, 10))
lengths_samples_test_idx = rng.permutation(40).reshape((4, 10))
#######################
# P A R A M E T E R S #
#######################
# WEASEL+MUSE parameters
weasel_muse_params = {'word_size': 2, 'n_bins':2, 'window_sizes': [12, 36],
'chi2_threshold': 1e-80}
transformer_list = [WEASELMUSE(**weasel_muse_params) for _ in range(4)]
#######################
### T R A I N I N G ###
#######################
X_weasel_train = []
for samples_idx, length, transformer in zip(lengths_samples_train_idx, lengths, transformer_list):
X_weasel_train.append(transformer.fit_transform(X_train[samples_idx, :, :length], y_train[samples_idx]))
# Concatenate the array as a DataFrame and fill NA values with 0
df_weasel_train = pd.concat([
pd.DataFrame.sparse.from_spmatrix(
X, index=samples_idx, columns=np.vectorize(transformer.vocabulary_.get)(np.arange(X.shape[1]))
)
for X, samples_idx, transformer in zip(X_weasel_train, lengths_samples_train_idx, transformer_list)
]).fillna(0.)
# Perform feature selection using chi2 test
chi2_threshold = 2.
chi2_statistics, _ = chi2(df_weasel_train, y_train)
features_idx_to_keep = np.where(chi2_statistics > chi2_threshold)[0]
features_to_keep = df_weasel_train.columns[features_idx_to_keep]
df_weasel_train = df_weasel_train[features_to_keep]
#######################
## I N F E R E N C E ##
#######################
X_weasel_test = []
for samples_idx, length, transformer in zip(lengths_samples_test_idx, lengths, transformer_list):
X_weasel_test.append(transformer.transform(X_test[samples_idx, :, :length]))
# Concatenate the array as a DataFrame and fill NA values with 0
df_weasel_test = pd.concat([
pd.DataFrame.sparse.from_spmatrix(
X, index=samples_idx, columns=np.vectorize(transformer.vocabulary_.get)(np.arange(X.shape[1]))
)
for X, samples_idx, transformer in zip(X_weasel_test, lengths_samples_test_idx, transformer_list)
]).fillna(0.)[features_to_keep]
Let me know if this helps you.
oh wow, thanks for the extensive example. I wouldn't have considered using separate instances of WEASELMUSE but it makes sense. I will give it a try
If I understand the WEASEL+MUSE algorithm correctly it should be possible to use it with samples of different lengths. This is currently not possible with the API of the WEASELMUSE class which expects a 3d array in the shape = (n_samples, n_features, n_timestamps) since a numpy array has the same shape for all samples.
I tried to fill the time series of all samples to the length of the longest samples with nan values, but the input checks reject nan values. Is there a way to achieve using samples of different lengths?