huggingface / speech-to-speech

Speech To Speech: an effort for an open-sourced and modular GPT4-o
Apache License 2.0
2.99k stars 314 forks source link

torch.OutOfMemoryError: CUDA out of memory. #8

Open chandan0000 opened 1 month ago

chandan0000 commented 1 month ago

image my system config image

nirabo commented 3 weeks ago

Same here. I have a running faster-whisper-server on another machine and am looking at how to integrate with it, instead of Whisper, as in this example. This will off-load the my GPU (3060ti ~7.5GB) and give me a chance to test the setup.

@AlexHayton @andimarafioti @eustlb @Vaibhavs10 : any hints of how I can point to my running STT faster-whisper-server?

andimarafioti commented 3 weeks ago

Hi! 👋 Sorry it took me a few days to get here. I would start by trying this out with a smaller LLM, since that is the part that is failing. Why don't you try passing --lm_model_name HuggingFaceTB/SmolLM-360M-Instruct to the python s2s_pipeline.py call?

chandan0000 commented 3 weeks ago

@andimarafioti i think vram issue i am running this model --lm_model_name HuggingFaceTB/SmolLM-360M-Instruct again same error

andimarafioti commented 2 weeks ago

oh yeah, that card has only 6gb of vram, you're brave 😅 If you manage to get all of this models to run with that little vram that would be great for the library, please report how you did it!