Open IoSonoMarco opened 2 months ago
Anybody here? :)
Hi Marco,
All of our pretraining fMRI data from the UK Biobank was normalized using one preprocessing pipeline, so the normalization will be standard across our fMRI data. We have provided some preprocessing scripts which may allow you to normalize your dataset in the same way, if you have raw fMRI recordings: https://github.com/vandijklab/BrainLM/blob/main/toolkit/BrainLM_Tutorial.ipynb
If you aren’t able to normalize your data in the same way, and have issues with using BrainLM on datasets with a different normalization, you could also try a small finetuning phase on a portion of your data.
Let us know if this is helpful.
Thanks for replying.
So, the preprocessing is shared among the pre-training datasets. In my case, I have a preprocessed dataset and have no access to raw data. But in general, a researcher may want to use different preprocessing pipelines compared to the ones proposed by UKBiobank.
I was wondering whether an eventual performance drop due to a different preprocessing pipeline used for the raw fMRI data can be recovered with LoRA adaptation.
Just to double-check, have you tried this yet?
We have not yet tried recovering performance on a differently-normalized dataset, but this is a great question. Let me know if you find anything regarding this, or need help with running parameter-efficient finetuning on BrainLM.
Ok I'll let you know, thanks. However, any chance to have even a sample toy dataset to run one of the notebooks to figure out the data structure for the dataloader? I'm focusing on the notebook inference_01_cls_token_raw_data_plotting
. Several calls to data loading are made here.
I will try to see if we can share a sample dataset from the Human Connectome Project as an example, but I am unsure due to data sharing agreements in place for different fMRI datasets. But we are happy to help you through preprocessing some of your own data through the UK Biobank + our own preprocessing pipeline.
Hey, thanks for the answer. I don't think there is a need to go through HCP example data. Actually, I would just be happy to know the final data structure that enters the BrainLM Encoder. Let's say that my starting point are the normalized timeseries for every voxel, so a 4D tensor. Is there any chance to start from this structure, and know which of your pipelines (normalized timeseries processing) to use to feed that encoder with this data?
I was wondering whether you pre-trained the model based on very homnogeneous pre-processed fMRI data from UK Biobank. For instance, I have an already pre-processed fMRI dataset with normalized time-series. Probably, the pre-processing applied here is different from the one applied in the UK Biobank data.