sccn / BCILAB

MATLAB Toolbox for Brain-Computer Interface Research
Other
195 stars 120 forks source link

Weighting of variance for SMSE calculation (population vs. sample) #25

Open DMRoberts opened 7 years ago

DMRoberts commented 7 years ago

% I recently identified an issue with the formulation of SMSE currently in % BCILAB, especially for small samples.

% SMSE normalizes the MSE by the variance of the (real-valued) targets. % This standardizes the degree of error across experiments which may use % targets of different magnitudes. Additionally, SMSE should equal 1 if % each 'prediction' is merely the mean of the target vector.

% The default formulation of variance within MATLAB estimates population variance from the % provided sample, normalizing the sum of squared deviations from the mean % by (n - 1). This has the effect of biasing the variance estimate upward, % relative to the variance of the sample alone. This effectively lowers SMSE. % This is especially an issue for small sample sizes. Below is an example, % of both the existing and corrected formulation of variance (in which the sum of squared % deviations of the mean are instead normalized by n). For a vector of 1000 chance % predictions, the old and new formulations equal .999 and 1, respectively, % as the sample and population variances converge as sample size increases. % However, for a vector of only 2 chance predictions, old and new equal .5 % and 1, respectively.

% I will also submit a pull request shortly, referencing the issue. % I apologize for not noticing this earlier, as I believe the current % formulation of SMSE is something I had submitted.

% SMSE example:

% where: % Tx = vector of real valued targets % Px = vector of real valued predictions

% use 1000 samples, always 'predicting' the mean of the target vector n = 1000; Tx = rand(1, n); Px = repmat(mean(Tx), [1 n]);

smse_1000_samples = mean((Px-Tx).^2) ./ var(Tx);

% smse = .999

% as previous, but with only two samples n = 2; Tx = rand(1, n); Px = repmat(mean(Tx), [1 n]);

smse_2_samples = mean((Px-Tx).^2) ./ var(Tx);

% smse = .5! The variance formulation of course makes a huge difference % with only 2 samples.

% SMSE with target variance normalized by the number of samples, not number % of samples - 1. Variance estimate here is equivalent to MATLAB built in % var with second argument of 1 ( var(Tx, 1) ).

smse_1000_samples_new = mean((Px-Tx).^2) ./ mean((Tx - mean(Tx)).^2);

% new smse = 1

% the same with only 2 samples n = 2; Tx = rand(1, n); Px = repmat(mean(Tx), [1 n]);

smse_2_samples_new = mean((Px-Tx).^2) ./ mean((Tx - mean(Tx)).^2);

% new smse = 1