I'm on OpenSUSE Leap 42 on WSL using Python 3.9.7, pytorch 1.11.0 (cuda version) and laserembeddings 1.1.2.
I'm computing embeddings for a long list of English text sequences (each of length <= 300 characters) using Laser.embed_sentences(). My GPU is an RTX 3080.
The property Laser.bpeSentenceEmbedding.encoder.use_cuda is True, indicating that the GPU is detected and Laser is attempting to use it. GPU memory is reserved as expected.
However, the GPU remains idle, at 0 to 2% utilisation, with the python process using 100% of a CPU thread. The performance is the same when I disable the GPU entirely, as is the CPU utilisation.
This leads me to believe that although something is taking up GPU memory, the inference is being done on the CPU. This is puzzling as looking at the source code the model and data are both unambiguously moved to the GPU if use_cuda is enabled.
Assuming it's not a quirk of my setup, I think this is quite a high priority issue as applications of this library on large amounts of data require GPU acceleration to be economical.
I'm on OpenSUSE Leap 42 on WSL using Python 3.9.7, pytorch 1.11.0 (cuda version) and laserembeddings 1.1.2.
I'm computing embeddings for a long list of English text sequences (each of length <= 300 characters) using Laser.embed_sentences(). My GPU is an RTX 3080.
The property
Laser.bpeSentenceEmbedding.encoder.use_cuda
isTrue
, indicating that the GPU is detected and Laser is attempting to use it. GPU memory is reserved as expected.However, the GPU remains idle, at 0 to 2% utilisation, with the python process using 100% of a CPU thread. The performance is the same when I disable the GPU entirely, as is the CPU utilisation.
This leads me to believe that although something is taking up GPU memory, the inference is being done on the CPU. This is puzzling as looking at the source code the model and data are both unambiguously moved to the GPU if
use_cuda
is enabled.Assuming it's not a quirk of my setup, I think this is quite a high priority issue as applications of this library on large amounts of data require GPU acceleration to be economical.