Speed improvements on sgcca

rgcca-factory / RGCCA

https://rgcca-factory.github.io/RGCCA/

10 stars 11 forks source link

Speed improvements on sgcca #32

Closed llrs closed 2 years ago

llrs commented 2 years ago

This PR has some minor speed tweaks:

Reduce the number of iterations
Pre-allocate whenever possible
Check the biggest/unvariant conditional before iterating
Use (t)crossprod whenever possible
Pass arguments on functions used on *apply family of functions.

There are some other minor tweaks like adding a new shave method to use instead of shave.malist, shave.veclist and using message instead of cat(paste0(

These changes result on an almost 1x speed improvement and a reduction on memory allocated on my tests with 50 iterations:

sgcca_comparison_branches

Hope it helps!

llrs commented 2 years ago

I have added a new commit with the requested changes. I have also pre-allocated the initial lists that I missed previously.

The other big change that could speed everything is to replace missing data by 0 on rgcca before calling any other subsequent function. I'm not sure if this is deferred to when the matrix multiplications happen for some reason as I haven't tested it, but I think this would help simplifying the code and make it a bit faster.

GFabien commented 2 years ago

The other big change that could speed everything is to replace missing data by 0 on rgcca before calling any other subsequent function. I'm not sure if this is deferred to when the matrix multiplications happen for some reason as I haven't tested it, but I think this would help simplifying the code and make it a bit faster.

The current strategy is to allow different ways to handle missing values: complete and nipals for now. The first one removes all lines containing missing values, the second virtually changes missing values to 0 by using this pm function to make the matrix multiplications. There used to be other methods to deal with missing values, they probably had an interest in this pm mechanism too but I am not sure. Given the current state I think you are right, it would make more sense and simplify greatly if we just replaced missing values with 0 right from the start. However, I think it will wait the next release as it might break the code if not done carefully...

llrs commented 2 years ago

Done! When documenting the package roxygen2 updated the RoxygenNote version and I added the Encoding: UTF-8 to avoid a warning from roxygen2.

llrs commented 2 years ago

Somehow I missed that criteria change, sorry. Fixing that mistake the speed gain is practically insignificant.

speed_cran_v2_100 I made sure this time that between the both versions they return the same result