Multi-GPU support for GeneratorCT2fromHfHub Falcon-40B-instruct

michaelfeil / hf-hub-ctranslate2

Connecting Transformers on HuggingFace Hub with CTranslate2

MIT License

32 stars 2 forks source link

I am encountering a RuntimeError: CUDA failed with error out of memory while attempting to load the Falcon-40B-instruct model using the GeneratorCT2fromHfHub module on GPU. Upon inspecting the GPU usage with nvidia-smi, I noticed that only one GPU is utilizing all the memory, while the other GPUs remain unused.

I have reviewed the code but couldn't find any indications of multi-GPU support. Could you please confirm if multi-GPU support has been implemented, and if I missed something in the code? Alternatively, is multi-GPU support planned for future sprints?

I have attached a screenshot of the GPU memory usage for your reference.

michaelfeil / hf-hub-ctranslate2

Multi-GPU support for GeneratorCT2fromHfHub Falcon-40B-instruct #11