treder / MVPA-Light

Matlab toolbox for classification and regression of multi-dimensional data
MIT License
70 stars 35 forks source link

How to Handle Unequal Number of Trials Across Different Classes in MVPA Light? #48

Open darianyao opened 4 months ago

darianyao commented 4 months ago

Hi, Mr treder. I am currently working with MVPA Light and have encountered an issue where the number of trials across different classes is not equal. Could you please advise on the best practices or methods within MVPA Light to handle this imbalance? Any guidance or examples would be greatly appreciated. Thank you very much!

darianyao commented 4 months ago

Additionally, I noticed that there are examples for analyzing MEEG data. These examples have been tremendously beneficial to me. Thank you for your assistance. And could you provide a similar example for classifying fMRI data using MVPA Light? This would be extremely helpful for beginners. Thank you again!

treder commented 3 months ago

Hi @darianyao ! You can used the preprocessing pipeline to either oversample the minority classes or undersample that majority classes, see the preprocessing examples.

Regarding fMRI data, it would be indeed be nice to have concrete examples in the toolbox examples here. For now, you can refer to the example I mentioned in the MVPA-Light paper, namely the analysis of the Haxby dataset. The code is here. I hope this gives you a useful starter.

Let me know whether this answers your queries.

darianyao commented 3 months ago

Thank you for your reply. I have resolved the data imbalance issue by studying the examples in the toolbox. However, I seem to have encountered a bug. When I set the parameters cfg.cv = 'leaveout' and cfg.metric = 'auc', the results returned by mv_classify function are always 0.

darianyao commented 3 months ago

Additionally, I have another issue. When using PCA for preprocessing and neighborhoods (searchlight in the time dimension) for analysis, the mv_classify function throws an error. I suspect this might be because the searchlight matrix is defined based on the original data, but after PCA, some redundant features from the original data are removed. Perhaps it would be simpler for users to input parameters directly rather than defining a matrix, for example, setting cfg.neighborhoods = 3 for a window containing three points, setting cfg.neighborhood_dim = 'channel' or cfg.neighborhood_dim = 'time' for dimension selection. Thank you :).

treder commented 3 months ago

AUC is the area under the ROC curve, with 1 sample in the test the ROC curve is essentially a line so the area is 0. The way it's calculated in MVPA-Light you would need both negative and positive examples (class 1 and 2) in the test set. You could use leaveout, collect the outputs (e.g., dvals) and then run mv_calculate_performance manually on the collected data. But I will try to see whether there is a reasonable hack for this situation just for convenience.

Re: neighbours I am not sure I understand the problem exactly. If you run a PCA (say on your voxel) you lose the notion of a neighbourhood structure (e.g., PC1 is not really a neighbour of PC2, since all of them are some linear combinations of voxels). Is your goal to use say PC1-PC3, PC2-PC4, PC3-PC5 and so on in a sliding window? You could fix the number of PCs or calculate the PCA beforehand (not inside the classification loop) and then define the neighbourhood matrix according to the PCs. Would this work?

darianyao commented 3 months ago

Thank you very much for your reply.

Re: AUC

"Thank you for helping me understand AUC better. I'm not from a computer science background, so I asked ChatGPT how to solve the issue of calculating AUC with leave-one-out cross-validation (LOO-CV) since it cannot be calculated in a single iteration. Here is ChatGPT's response: 'To correctly calculate the AUC using leave-one-out cross-validation (LOO-CV), we need to aggregate all the predicted results from each iteration and then calculate the ROC curve and AUC as a whole.' I'm not sure if this will be helpful to you."

Re: neighbourhoods

"Regarding the neighbourhoods issue, my solution is not to run PCA in the preprocessing stage:). I can't think of other methods. I will try the fixed PC numbers method you mentioned, but I'm not sure if it will resolve the errors that occur when using searchlight and PCA together."

treder commented 3 months ago

Glad I could help, good luck!

Re:AUC Yes, this is exactly what I suggested above. For now you have to do this "by hand". MVPA-Light does not do this because metrics are calculated on test sets and then averaged.