Feature request: Allow Piper inference to run on GPU

ken107 / read-aloud

An awesome browser extension that reads aloud webpage content with one click

https://readaloud.app

MIT License

1.39k stars 237 forks source link

Feature request: Allow Piper inference to run on GPU #424

Open BryceBarbara opened 1 month ago

BryceBarbara commented 1 month ago

I love the new Piper feature that allows for some better sounding voices to read text!

I've run into the issue that it can take a bit of time before you hear the first bits of audio. I assume this is due to the JavaScript inference engine doing things all on the CPU. On my work laptop, my CPU is gobbled up by various developer apps running (heck, I've got like 13 instances of chrome running thanks to everyone using electron).

I was wondering, would the time to first sound (let's call this TTFS) be lower if we used the GPU?

For a quick search, it appears there are a few options for doing that:

C-Loftus commented 4 weeks ago

I am also highly interested in this. Since piper models use onnx and transformers.js provides GPU inference for onnx model, I feel like that might be another way to accomplish this with a higher-level library.

I think there might be some others also interested in this from other projects: https://github.com/diffusionstudio/vits-web/issues/3

If combined with the ability to export audio as mp3, I think it would be amazing. It would allow audiobooks to be created super easily and with awesome UX in the browser. https://github.com/ken107/read-aloud/issues/7 https://github.com/ken107/read-aloud/issues/159

If anyone has ideas on this, please reach out. I would love to hack on this but am unsure where to start

ken107 commented 2 weeks ago

I cant' recall where, but I saw multiple discussions about how Piper inferencing using GPU doesn't offer much performance improvement over CPU. Moreover, GPU support in Piper is not yet mature and still has issues. When I was R&Ding for https://github.com/ken107/piper-browser-extension, I tried doing GPU inferencing on my RTX 3060 and ran into some problem with unsuported operators. Not a machine-learning expert, I couldn't resolve the issue. Anyway, just adding my experience.

raymondtri commented 2 weeks ago

https://github.com/rhasspy/piper/issues/598