[discussion] Opportunities for faster inference

neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality

Apache License 2.0

12.89k stars 1.78k forks source link

[discussion] Opportunities for faster inference #261

Open nicolabortignon opened 1 year ago

nicolabortignon commented 1 year ago

I've just started looking into Tortoise. Impressive body of work.
Just by reading the wip paper, it's clear to me there is soo much under the hood to tweak and play with.

As I'd prefer to continue exploring it locally, I want to figure out a way to reduce a bit the inference time. I was wondering if anyone here have had thoughts on how to reduce the computational time for the autoregression and candidate selection step. For instance:

Is there a way to pre-compute the embeddings vector of a specific voice, and always re-use it? Would it help?
For specific audio (like audiobook or podcast), is there a way to tune the length of the token sequence (I'm just hard guessing that there might be a parameter on how long a sentence should be to maintain consistency).

Any other thoughts?

arkilis commented 1 year ago

Have you tried with GPU instead of CPU?

nicolabortignon commented 1 year ago

For me specifically, I'm running on an M1 Ultra, and GPU (cuda) would not work. I'm trying to get MPS to work for this codebase, but haven't succeed just yet.

I would like to use tortoise for very long rendering, so anything that I can cut, is helpful, even in a GPU context.

darth-veitcher commented 1 year ago

Did you ever get this working with MPS @nicolabortignon ? I’m just about to look at it myself.

oskarjor commented 1 year ago

Regarding using MPS. There is a problem that the transformers library internally uses the function torch.topk(). This is not supported on MPS for top_k > 16. When I tried to send this to the CPU, Python complained that tensors were found on two different devices. Anyone know of a workaround for this?

site-packages/transformers/generation_logits_process.py", line 236, in call indices_to_remove = scores < torch.topk(scores.to("cpu"), top_k)[0][..., -1, None] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, mps:0 and cpu!

seohyunjun commented 1 year ago

i changed "cuda" -> "cpu"

https://github.com/seohyunjun/tortoise-tts

recommend using this one

aptonline commented 1 year ago

i changed "cuda" -> "cpu"

https://github.com/seohyunjun/tortoise-tts

recommend using this one

Will give it a try on my M1 Max, could the same changes be easily applied to the tortoise-tts-fast version as this has the advantage of a GUI.

seohyunjun commented 1 year ago

i don't recommend using gpu with torch, because mps doesn't support fft_r2c.

so you met fft (fast-furier-transform) error. (mps weak calculate complex type)

[current mps issue] https://github.com/pytorch/pytorch/issues/77764

someday it will fixed.. i hope ..

I hope this helps. 😢