vandijklab / BrainLM

Apache License 2.0
62 stars 14 forks source link

Is this model agnostic to the kind of fMRI pre-processing? #6

Open IoSonoMarco opened 2 months ago

IoSonoMarco commented 2 months ago

I was wondering whether you pre-trained the model based on very homnogeneous pre-processed fMRI data from UK Biobank. For instance, I have an already pre-processed fMRI dataset with normalized time-series. Probably, the pre-processing applied here is different from the one applied in the UK Biobank data.

IoSonoMarco commented 2 months ago

Anybody here? :)

SyedA5688 commented 2 months ago

Hi Marco,

All of our pretraining fMRI data from the UK Biobank was normalized using one preprocessing pipeline, so the normalization will be standard across our fMRI data. We have provided some preprocessing scripts which may allow you to normalize your dataset in the same way, if you have raw fMRI recordings: https://github.com/vandijklab/BrainLM/blob/main/toolkit/BrainLM_Tutorial.ipynb

If you aren’t able to normalize your data in the same way, and have issues with using BrainLM on datasets with a different normalization, you could also try a small finetuning phase on a portion of your data.

Let us know if this is helpful.

IoSonoMarco commented 2 months ago

Thanks for replying.

So, the preprocessing is shared among the pre-training datasets. In my case, I have a preprocessed dataset and have no access to raw data. But in general, a researcher may want to use different preprocessing pipelines compared to the ones proposed by UKBiobank.

I was wondering whether an eventual performance drop due to a different preprocessing pipeline used for the raw fMRI data can be recovered with LoRA adaptation.

Just to double-check, have you tried this yet?

SyedA5688 commented 2 months ago

We have not yet tried recovering performance on a differently-normalized dataset, but this is a great question. Let me know if you find anything regarding this, or need help with running parameter-efficient finetuning on BrainLM.

IoSonoMarco commented 2 months ago

Ok I'll let you know, thanks. However, any chance to have even a sample toy dataset to run one of the notebooks to figure out the data structure for the dataloader? I'm focusing on the notebook inference_01_cls_token_raw_data_plotting. Several calls to data loading are made here.

SyedA5688 commented 1 month ago

I will try to see if we can share a sample dataset from the Human Connectome Project as an example, but I am unsure due to data sharing agreements in place for different fMRI datasets. But we are happy to help you through preprocessing some of your own data through the UK Biobank + our own preprocessing pipeline.

IoSonoMarco commented 1 month ago

Hey, thanks for the answer. I don't think there is a need to go through HCP example data. Actually, I would just be happy to know the final data structure that enters the BrainLM Encoder. Let's say that my starting point are the normalized timeseries for every voxel, so a 4D tensor. Is there any chance to start from this structure, and know which of your pipelines (normalized timeseries processing) to use to feed that encoder with this data?