psychelzh / cpmr

Connectome predictive modelling in R
https://psychelzh.github.io/cpmr/
Other
0 stars 0 forks source link

Rethink the measure of generalization error #16

Open psychelzh opened 2 hours ago

psychelzh commented 2 hours ago

Currently, the measure of generalization error used in summary() is the correlation between the pooled predictions and the real values. But sklearn warns against doing so:

Note on inappropriate usage of cross_val_predict

The result of cross_val_predict may be different from those obtained using cross_val_score as the elements are grouped in different ways. The function cross_val_score takes an average over cross-validation folds, whereas cross_val_predict simply returns the labels (or probabilities) from several distinct models undistinguished. Thus, cross_val_predict is not an appropriate measure of generalization error.

A sounder method is calculating generalization errors separately for each fold, and the average them. But Pearson correlations might need special treating.

psychelzh commented 2 hours ago

Of course, the current method just follows the original paper. So, we would leave this issue open here because it might not be proper to implement another measure for now.