treder / MVPA-Light

Matlab toolbox for classification and regression of multi-dimensional data
MIT License
70 stars 35 forks source link

ENH - broadcast the testdesign to the outside world #38

Closed schoffelen closed 1 year ago

schoffelen commented 1 year ago

second try: I was wondering whether there's a reason not to broadcast the (CV-)test design to the result output structure. If not, this PR adds it.

as an added note to @treder: is there a specific reason why the high level mv_classify/mv_regress function do not output the classification/regression weights (optionally)? I can imagine that this might easily blow up in terms of memory consumption, but I have now worked with a few researchers who are actually interested in evaluating those (e.g. looking at topographies), or use them in a more advanced cross-decoding context.

treder commented 1 year ago

hi @schoffelen no it's a good idea and something that's been on the todo list. I was thinking of adding something like a cfg.history parameter for a while that would return everything (main CV folds and classifier/regression parameters). Things are going bit slow for me atm, I've started working on the imputation but might address this PR first (since easier). Should be getting back next week.

treder commented 1 year ago

EDIT: what about the following approach, there's a boolean field cfg.misc, if cfg.misc=1 then there's a result.misc with

Lmk what you think.

schoffelen commented 1 year ago

I copy in this, because I cannot find a trace of the comment in the github web interface (only in my e-mail): I thought about it bit more, the easiest way would be the following. Say we have a field cfg.save (a better name than cfg.history?) which is a boolean. If 1, there's a field result.save which is an array of structs or a cell array:

result.save(1).x = data in fold 1 result.save(1).trainlabel = train labels in fold 1 result.save(1).testlabel = test labels in fold 1 result.save(1).model = model in fold 1 ... result.save(10).x = data in fold 10 result.save(10).trainlabel = train labels in fold 10 result.save(10).testlabel = test labels in fold 10 result.save(1).model = model in fold 10 The only thing I'm worried about is saving the data for every fold this will massively blow up space. In principle we could leave out the data field and it could be manually re-created by the user using the labels if necessary. However, any mv_preprocess function that involves some randomness would break this.

Let me know what you think.

schoffelen commented 1 year ago

Thanks for thinking along, and picking up on this @treder!

I agree with your concern of the risk that memory consumption can explode quite easily. To some extent it can be left to be the user's responsibility. On the other hand, I am wondering whether a solution needs to be super generic, and include all of the information that can be harvested. Specifically, the use cases I had in mind will not need access to the (preprocessed-per-fold-and-repeat) test data (or training labels). Access to the testlabel/testdesign will be useful in case cfg.metric was 'none'. My first thoughts are about use cases were:

1) the model is to be evaluated, e.g. by visualizing the (transformed version of the) weights as a channel topography. 2) a more creative cross-decoding is required, e.g. applying the model estimated in one dataset at a fixed latency to another dataset at multiple latencies.

treder commented 1 year ago

Thanks it's good to think about the usecases more specifically. I wonder whether we should just leave it to the user to specify the extra information they want saved in a cell array e.g. cfg.save = {'xtrain' 'trainlabel' 'testlabel' 'model'} which would then lead to the fields result.xtrain, result.trainlabel, ... to exist. We can also return the testlabel by default in case cfg.metric = 'none'.

schoffelen commented 1 year ago

I think that could work!

treder commented 1 year ago

Should be ready:

Lmk whether this addresses the scenarios you mentioned sufficiently well. If so, I would merge

schoffelen commented 1 year ago

Great! I did not read the code changes in detail, but what you describe in the docstring suggests that this is very useful (and indeed would be enough to cover the use cases I can think of)

treder commented 1 year ago

Cool! I'll merge then. I added some unit tests so should be fine. Thank you for raising this and the contribution.