treder / MVPA-Light

Matlab toolbox for classification and regression of multi-dimensional data
MIT License
70 stars 35 forks source link

ENH - some minor suggestions for speedup #47

Open schoffelen opened 5 months ago

schoffelen commented 5 months ago

Hi @treder, I hope all is well. This PR intends to push a few changes for speed improvement, which I implemented a while ago, and which I don't want to lose track of. Feel free to ignore, if you don't think it fits

treder commented 3 months ago

Hi @schoffelen , first of all sorry for getting back on this so late! All well hope same for you.

Thank you for the code, I did some benchmarking and just want to share the results. I used the following toy data:

%%% Create Gaussian data
nsamples = 1000;
nfeatures = 300;
nclasses = 5;
prop = [];
scale = 0.0001;
do_plot = 0;

[X_gauss,clabel_gauss] = simulate_gaussian_data(nsamples, nfeatures, nclasses, prop, scale, do_plot);

I ran train kernel fda in a loop and timed how long it takes for the function to return.

nsamples = 3000;
nfeatures = 300;
nclasses = 30;

Here, PR needs 71% of MVPA-Light time - PR is faster

nsamples = 1000;
nfeatures = 300;
nclasses = 5;

PR needs 110% of MVPA-Light time - slower

nsamples = 2000;
nfeatures = 100;
nclasses = 2;

PR needs 291% of MVPA-Light time - slower

So it looks like a good speed-up for large number of classes but you pay a price when the number of classes is small. Can we get best of both worlds somehow?

treder commented 3 months ago

On the LedoitWolfEstimate I get the following:

N = 1000;
P = 100;
X = simulate_spiral_data(N, 3, 1);

Using tic-tic with 20 iterations:

MVPA-Light: 0.6983s PR: 0.1865s

N = 1000;
P = 1000;

MVPA-Light: 0.6993 PR: 0.1976

N = 2000;
P = 4;

MVPA-Light: 4.9714 PR: 0.6215

So PR seems always faster, should definitely go for this one.

schoffelen commented 3 months ago

Hi @treder now it's my time to apologize for my slowness. Thanks for looking into this in much more detail than I had... With respect to the profiling outcome of the fda training, I can imagine that it is not really straightforward to determine a good heuristic for the winning implementation, given that it depends on nsampls/nfeatures/nclasses, and perhaps also then on the compute infrastructure (operating system, multithreaded etc). Would it be an overkill to build in an option to switch between the two implementations? (and then perhaps determine the default in a nclasses dependent way). I would also be fine with just ditching the changes to the FDA training function: how realistic is it to have such large amount of classes?

treder commented 3 months ago

Yes this should work. It could be a flag called loop (if you have a better name lmk). If 1 (default) it calculates the kernel matrices in a loop, if 0 it uses your approach. I cannot commit to your branch because it is a bit out of date (other files changed). Could you update your PR?

Alternatively I can discard the PR and push the changes myself, whichever you prefer.

schoffelen commented 2 months ago

Hi @treder I am on the road at the moment (in a different time zone), and tried to do some git voodoo (basically to rebase the PR against the master, but not much happened). I wouldn't mind to do some work on this, but perhaps if you're OK with discarding the PR and push the changes yourself, that would be great (but only if it would take you 5 minutes to do it)

treder commented 2 months ago

OK will do. I've uploaded the LedoitWolfEstimate update to the main branch, then my Matlab license expired so will take a couple days :D