I wonder if it would be possible to improve on that front:
Currently, I observe 20 GB of RAM being used while training on a rather smallish dataset
(160 datapoints w/ 4428 features each; and this is a sparse representation).
Maybe the problems comes from this:
you only work w/ dense matrices while some molecular encoding (fingerprint based) are sparse.
The algorithm needs to optimize a matrix with size Nfeatures x Nfeatures. The memory consumption come from here, not so much from the number of training points.
I wonder if it would be possible to improve on that front:
Currently, I observe 20 GB of RAM being used while training on a rather smallish dataset (160 datapoints w/ 4428 features each; and this is a sparse representation).
Maybe the problems comes from this: