gzerveas / mvts_transformer

Multivariate Time Series Transformer, public version
MIT License
727 stars 171 forks source link

One pre-trained model for all datasets or one for each? #7

Closed Roxyi closed 2 years ago

Roxyi commented 2 years ago

Hi @gzerveas , when I read your paper, I thought you pre-trained ONE model based on all datasets and fine-tuned on each dataset for the specific classification/regression task. However, based on the README in this repository, it seems that each dataset has a pre-trained model. Which method did you use in your experiments?

gzerveas commented 2 years ago

@Roxyi The datasets are very disparate, both in terms of their characteristics (e.g dimensionality, time series length, sparsity), but more importantly, in terms of the natural quantities they encode (e.g. absorption frequency spectra, ECG/EEG signals, weather data etc). It would therefore make little to no sense training a single model on all datasets: there is little, if any, shared information, and the result would be catastrophic forgetting. I personally wouldn't trust a method that claims to perform better on time series of biomedical signals, after being pretrained on predicting traffic data of the road network of San Fransisco, or absorption spectra of chemical substances. What happens instead is that the models are pretrained in an un-/self-supervised way to "fill in the gaps" in the time-series of a dataset of interest, and then are fine-tuned in a supervised way (using the provided labels in the dataset) to predict the label classes or physical quantities (in regression). There may be one or two exceptions for this: for example, the BeijingPM25 and BeijingPM10 datasets share time series data and only differ in the labels; for those, indeed a single model was pretrained in a self-supervised way, and then was fine-tuned for each one of these datasets.

Roxyi commented 2 years ago

@gzerveas Got you, that makes sense. Thank you for the explanation.