Open stefanosh opened 1 month ago
Hi Stefanos, Thanks for your interest in MOMENT!
We designed this codebase to be extremely lightweight. In the process, we removed a lot of code to handle different datasets.
We are working on releasing a bulkier research code, which will include dataset classes to process all different datasets. Until then, I would point you to the early version of our research code on Anonymous Github. The dataset class defined in the anonymous codebase should work for the datasets you are referring to.
Please let us know if it doesn't! Again, thanks for your interest in MOMENT! Feel free to close the issue if this solves your problem.
Hi, thanks for the quick catchup!
In fact, errors still occur when working with the multivariate datasets (e.g. Handwriting, Heartbeat, UWaveGestureLibrary).
For example, using Heartbat (n_channels=61) with the dataset class from the anonymous codebase, the errors brings up as:
IndexError: index 61 is out of bounds for axis 1 with size 61, Line 236: timeseries = self.data[:, index].
Since, I assume, channel handling affects multiple parts of the codebase and MOMENT model itself besides this dataset class, I'd appreciate your insights to get a complete workaround.
Thanks.
Thanks for catching it! For multi-variate classification datasets, currently we use the following work around: Given a dataset of dimensions $(N, C, T)$, where $N, C$, and $T$ are the number of time series, channels and length of time series, we shape it into $(NC, T)$, and use MOMENT to get representations $(NC, d)$, where $d=1024$ for MOMENT-Large, and reshape these embeddings back to $(N, C, d)$.
This workflow is illustrated in our script to run MOMENT on the UEA datasets in the most recent Anonymous GitHub repository: Multivariate scripts.
Now MOMENT can technically handle multivariate data, but it treats each channel independently (and reshapes the channel time series along the batch dimension too). So this is just an artifact of the dataset class and the data loader.
We agree this is not perfect, and would really appreciate improvements in the codebase!
Thanks!
Hi and thanks for the answer.
I confirm the workaround and also the successful reproduction of code for multivariate classification in PR https://github.com/moment-timeseries-foundation-model/moment/pull/23 ((although for different dataset).
Another clarification I would like to ask for is for fine-tuning on new, unlabeled data to improve downstream classification.
In specific, given unlabeled time series (from the same domain as the labeled downstream series or at least related), how do we approach fine-tuning?
from momentfm import MOMENTPipeline
model = MOMENTPipeline.from_pretrained(
"AutonLab/MOMENT-1-large",
model_kwargs={
'task_name': 'reconstruction',
'forecast_horizon': 192,
'head_dropout': 0.1,
'weight_decay': 0,
'freeze_encoder': False, # Freeze the patch embedding layer. True by default.
'freeze_embedder': False, # Freeze the transformer encoder. True by default.
'freeze_head': False, # False by default
},
)
Then, use the encoder, embedding params (either or both?) of _sourcemodel's state_dict to update the state_dict of the initialized _targetmodel in classification mode.
Then, either use SVM training on top of target_model embeddings or proceed with linear probing or full finetuning.
Thanks.
Hello,
Thank you very much for disseminating openly your work and code.
I am trying to reproduce the classification notebook. It seems there is a bug in the ClassificationDataset class when loading the multivariate datasets (in univariate datasets e.g. ECG5000, there is no problem) due to incorrect reshape/n_channel handling.
In specific, when loading Heartbeat dataset (tried also with Handwriting), this error occurs:
ValueError: cannot reshape array of size 5039820 into shape (204,405), in line: self.data = self.data.reshape(self.num_timeseries, self.len_timeseries)
Thank you in advance.