Open schoffelen opened 5 months ago
Hi @schoffelen , first of all sorry for getting back on this so late! All well hope same for you.
Thank you for the code, I did some benchmarking and just want to share the results. I used the following toy data:
%%% Create Gaussian data
nsamples = 1000;
nfeatures = 300;
nclasses = 5;
prop = [];
scale = 0.0001;
do_plot = 0;
[X_gauss,clabel_gauss] = simulate_gaussian_data(nsamples, nfeatures, nclasses, prop, scale, do_plot);
I ran train kernel fda
in a loop and timed how long it takes for the function to return.
nsamples = 3000;
nfeatures = 300;
nclasses = 30;
Here, PR needs 71% of MVPA-Light time - PR is faster
nsamples = 1000;
nfeatures = 300;
nclasses = 5;
PR needs 110% of MVPA-Light time - slower
nsamples = 2000;
nfeatures = 100;
nclasses = 2;
PR needs 291% of MVPA-Light time - slower
So it looks like a good speed-up for large number of classes but you pay a price when the number of classes is small. Can we get best of both worlds somehow?
On the LedoitWolfEstimate
I get the following:
N = 1000;
P = 100;
X = simulate_spiral_data(N, 3, 1);
Using tic-tic with 20 iterations:
MVPA-Light: 0.6983s PR: 0.1865s
N = 1000;
P = 1000;
MVPA-Light: 0.6993 PR: 0.1976
N = 2000;
P = 4;
MVPA-Light: 4.9714 PR: 0.6215
So PR seems always faster, should definitely go for this one.
Hi @treder now it's my time to apologize for my slowness. Thanks for looking into this in much more detail than I had... With respect to the profiling outcome of the fda training, I can imagine that it is not really straightforward to determine a good heuristic for the winning implementation, given that it depends on nsampls/nfeatures/nclasses, and perhaps also then on the compute infrastructure (operating system, multithreaded etc). Would it be an overkill to build in an option to switch between the two implementations? (and then perhaps determine the default in a nclasses dependent way). I would also be fine with just ditching the changes to the FDA training function: how realistic is it to have such large amount of classes?
Yes this should work. It could be a flag called loop
(if you have a better name lmk). If 1 (default) it calculates the kernel matrices in a loop, if 0 it uses your approach. I cannot commit to your branch because it is a bit out of date (other files changed). Could you update your PR?
Alternatively I can discard the PR and push the changes myself, whichever you prefer.
Hi @treder I am on the road at the moment (in a different time zone), and tried to do some git voodoo (basically to rebase the PR against the master, but not much happened). I wouldn't mind to do some work on this, but perhaps if you're OK with discarding the PR and push the changes yourself, that would be great (but only if it would take you 5 minutes to do it)
OK will do. I've uploaded the LedoitWolfEstimate
update to the main branch, then my Matlab license expired so will take a couple days :D
Hi @treder, I hope all is well. This PR intends to push a few changes for speed improvement, which I implemented a while ago, and which I don't want to lose track of. Feel free to ignore, if you don't think it fits