naamiinepal / xrayto3D-benchmark

GNU General Public License v3.0
9 stars 3 forks source link

Monte Carlo Cross Validation #27

Open msrepo opened 1 year ago

msrepo commented 1 year ago

Run k-fold Cross validation seems too many training runs and gives an unbiased estimate but with high variance. What is our next best option, given we want to run atmost 3 training runs per dataset per architecture?

Monte Carlo Cross Validation is an option. Cons: gives biased estimate but lower variance. Correcting for bias in Monte Carlo Cross Validation

image image n1 training set, n2 test set, J such splits represents Monte Carlo Cross Validation image taking many such splits (J larger) is good. image

in gist, using the 1st method: say we split 100 samples into 80 train and 20 test samples. We do this split 3 times(monte carlo CV) i.e. J = 3. Then, corrected variance = (1/J + n2/n1)uncorrected variance = (1/3 + 1/4)uncorrected variance.

image image The second method require modification, does not seem feasible here.

Relevant references: Nadeau and Bengio, 2003 Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning