Implementing CUML-based linear probing

The CLOOB paper mentioned that it used CUML-based logistic regression with L-BFGS algorithm to utilize GPUs for efficiency. My implementation works fine on small datasets (e.g., CIFAR), but CUDA out of memory occurred when dealing with large-scale ImageNet.

I have been stuck here for a pretty long time, and I cannot find useful support from the document or the Internet. Is it possible to provide a few code examples highlighting how to fix this problem?

ml-jku / cloob

Implementing CUML-based linear probing #11