Questions about using iTransformer for modeling human brain data

First of all, thank you so much(!!) for the amazing research and code! It's hard to believe that a different way of thinking can yield such amazing results.

We are thinking of using your model as a backbone for our ECoG (intracranial EEG) foundation model, and had a few questions if you don't mind.

How would you modify the model to take in arbitrary input lengths? : We hope our model can handle arbitrary input lengths. We were thinking of using TCNs for input embedding (as mentioned In Appendix G.3 of your paper), and taking averages across timepoints to achieve this. (i.e. : (for each variable) [B, 1, T] =(1D convolution)=> [B, H, T] =(averaging across time T)=> [B, H] (H : token embedding size)). Our worries is that this would destroy the temporal information within the data. Is there any method that you recommend?
Explicitly modeling frequency : Because frequency is an important component in neural signals, we were wondering if we could append the Fourier transform to the token embeddings. Do you think this will enable iTransformer to model both temporal and frequency components? Is there another method you can recommend?
How would you modify the model to take in arbitrary sampling rates? : Neural signals can range from 256Hz to 4096Hz, which we hope our model can model. We were thinking of training different input embedding functions for each frequencies (while keeping the same transformer), but since the frequencies are usually powers of two (ex : 128, 256, 512, ...), and the underlying signals are the same (brain signals), we were wondering if there was a way to just train one input embedding function?
Have you tried self-supervised learning (SSL) on iTransformer? : We are planning on pretraining iTransformer using masked reconstruction. However, it seems that because each token in iTransformer is the whole variable, masking can be done only variate-wise (and not time-wise) ? (i.e. masking all the timepoints of a certain variable, instead of masking certain time windows of a variable). Or, have you tried any SSL methods on iTransformer or could you recommend one?
Have you tried adding embeddings to each variate token? : As far as we understand it, the input embedding function is variate-independent and therefore the model does not know which variable is which (since attention is permutation invariant). Have you thought of adding variable embeddings to the tokens to tell which variable it is? We are asking because be awesome if we could embed informations about each variable (ex : electrode location, modality, ...) into the token itself!

Sorry and thank you in advance for answer to my long list of questions.

Hi, thanks for so many very interesting questions!

(1) iTransformer uses Variate Tokens, and if MLP is adopted as the embedding, the input is fixed-length. One way to support a variable output length is to use a variable-length embedding (e.g. TCN) and guarantee temporal sequentiality, that is [B, 1, T] - => [B, H, kT]. However, the subsequent averaging will break sequentiality. Therefore, in this way, we recommend preserving the temporal dimension rather than letting it collapse, which may require a modification to the iTransformer. Fortunately, a recently presented work that supports variable-length, explicit multivariate modeling. It uses a generative Transformer (decoder-only, better generalization performance, and supports arbitrary-length output), more likely to serve a foundation model. Details are provided here.

(2) & (3) Of course, dealing with different frequencies is a very important issue. We recommend you take a look at Moirai's approach.

(4) Of course! The idea that we have tried to reconstruct some variables from others, which we call regression SSL, and the resulting model is similar to BERT, which acts as a general feature extractor for time series (admittedly, it may not be as directly useful as GPT for generative tasks, such as forecasting). We think this is very promising, but our current energy is relatively limited, have not yet conducted in-depth research, if you have further progress, feel free to communicate with us.

(5) It is an interesting question! You may take a look at the recent work:

In our recent work AutoTimes, we also tried to use the LLM-embedded information (e.g. timestamps) into time series forecasters, which is demonstrated to be very effective.

Thank you so much for your detailed response! Actually, I want to thank your whole lab, as your lab's various works (review papers on time-series models and TSLib) are really helping me, especially since we are neuroscientists new to time series modeling!

(1) I'll look more into the paper you recommended! The Timer-XL paper seems very very promising, the type of model we were waiting for! Do you by any chance know when the code for Timer-XL will be released? Currently we are testing 1% of our iEEG data (160GB) on various time series models in TSLib, but eventually we need to scale the model to 10~20 terabytes of data, which I heard requires writing code that is custom to the model (i.e. we cannot change our model after we choose which model to scale). Therefore, it would be great if we knew when the code is released, so that we can schedule our time around it!

(2)&(3) Thank you so much! I'll sure have a look! :)

(4) Thank you! I see! Although we are only interested in using our own iEEG datasets and not benchmark datasets (e.g., ECL, ETTH1, ...) (we are neuroscientists) but we'll keep you updated if you are interested in how your models can be applied to brain data!

(5) Wow thank you! :)

Woah the AutoTimes paper seems very promising. Especially using the LLM-information, as combining the prior brain knowledge in LLMs with brain-signals would be what neuroscientists always wanted!

I have one additional question if you don't mind : the Timer-XL paper seems to use patching to tokenize the time series, unlike iTransformer which doesn't use patching. Do you believe that the use of patching in Timer-XL would hinder modeling intra-patch dynamics (as pointed out in PathFormer)? Or would it be modeled by the feed-forward network in the transformer?

Again, thank you so much!!

We are glad you found our response helpful!

Of course! We have open-sourced a collection of scalable large time series models here very recently! The code framework is the same as TSLib to make it easy to use!

About the follow-up question: We think Patch token and Variate token have become representative tokenization approaches for the time series modality. They all have their strengths. In fact, we think Patch token can be more fine-grained at modeling temporal dynamics. An experimental observation is that the Patch token (PatchTST) works well when the lookback horizon of time series is long. On the other hand, if the time series is high-dimensional and has many variables, Variate token(iTransformer) wins. Consequently, Timer-XL combines the talents of both, and its generative modeling can support arbitrary output/output length and model scaling:

We’re very interested in how our models can be applied to brain data. If you have any specific questions about the papers or need further assistance, feel free to reach out.

Oh I see! Thank you so much :)

Yeah I was surprised when I looked at Time-XL, as you tested loopback lengths of like 8000, (we need to model very long sequences (hopefully upto 12,000 or more)).

Thank you so much for the offer of help. I'll keep you posted!! 👍

thuml / iTransformer

Questions about using iTransformer for modeling human brain data #139