about ViT performance on EEG data

zwcolin / EEG-Transformer

A ViT based transformer applied on multi-channel time-series EEG data for motor imagery classification

GNU General Public License v3.0

143 stars 17 forks source link

about ViT performance on EEG data #2

Open DrugLover opened 3 months ago

DrugLover commented 3 months ago

Hello 王哥 I wrote a simple ViT model to decode MI-EEG signals. The overall model is much the same as original ViT, and the code is at here. I used bci competition IV 2a dataset, which the input data shape is [1, 22, 1125]. I directly used a patch size = [22, 25], so the patch num is 1125/25 = 45. With this patch setting, I soon face a problem as you mentioned in readme, that the model is overfitted on training set. The results show that the generalize ability is worse that EEGNet.

Moreover, I applied a dropout layer in patch embedding, which performed to drop some patches. With this dropout, the results get much better, but it took muuuuuch more epochs to converge(still worse than EEGNet).

Recently, I found other versions of transformer in MI-EEG, the ShallowMirrorTransformer and Conformer. Sadly, both methods didn't performed as good as some CNN or LSTM-based ones.

I hope to know if there are some tricks when training ViT and see your experiment results. Thanks a lot!

zwcolin commented 3 months ago

Thanks for your interests in my course project. This project was started in 2022 and I feel it's a bit outdated (and I'm no longer maintaining it because I don't do research in the EEG domain).

My suggestion would simply be to give up the classical ViT architecture but try the following:

(1) Finetuning from an autoregressive ViT that predict both patches and labels. This gives you more training signals given limited data. You can use a LoRA and/or an adaptor to make training efficient and/or accommodate your input/output. (2) Try SSSMs such as Mamba.

DrugLover commented 3 months ago

Thanks for your advice! I have tried to simply replace mamba into aformentioned conformer, but it even get worse. I did another domain generalization experiment on EEG, and I simply changed the depth of transformer block, and it reached the SOTA performance!

LiuyinYang1101 commented 3 months ago

Thanks for your advice! I have tried to simply replace mamba into aformentioned conformer, but it even get worse. I did another domain generalization experiment on EEG, and I simply changed the depth of transformer block, and it reached the SOTA performance!

Hi DrugLover, I'm recently working on a similar project, aiming to do an SSL pertaining on EEG data from various sources and later fine-tuning on the downstream classification task using transformer-based models. My experiments also showed that most time these large models did not perform as well as small models (e.g., eegnet). I am wondering if we could have a discussion somewhere on this matter, maybe we can do something together. You can reach me through the following email: liuyin.yang@kuleuven.be. I'm looking forward to hearing from you soon.