the current implementation is pretty memory hungry [improvement] [suggestion]

lcmd-epfl / MLKRR

Code for the Metric Learning for Kernel Ridge Regression algorithm

MIT License

8 stars 4 forks source link

the current implementation is pretty memory hungry [improvement] [suggestion] #12

Closed UnixJunkie closed 3 months ago

UnixJunkie commented 3 months ago

I wonder if it would be possible to improve on that front:

Currently, I observe 20 GB of RAM being used while training on a rather smallish dataset (160 datapoints w/ 4428 features each; and this is a sparse representation).

Maybe the problems comes from this:

you only work w/ dense matrices while some molecular encoding (fingerprint based) are sparse.

raimon-fa commented 3 months ago

The algorithm needs to optimize a matrix with size Nfeatures x Nfeatures. The memory consumption come from here, not so much from the number of training points.