How to change similarity and stability in sampling.py?

metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS

https://themetavoice.xyz/

Apache License 2.0

3.78k stars 649 forks source link

How to change similarity and stability in sampling.py? #72

Closed G-force78 closed 7 months ago

G-force78 commented 7 months ago

Hi, great implementation Im impressed by the accuracy of one shot looking forward to the finetune training code released. In the meantime could you tell me how to change similarity and stability in sampling.py? What does it relate to? I am thinking top P and top K? Or guidance_scale: Optional[Tuple[float, float]] = (3.0, 1.0) """Guidance scale for sampling: (speaker conditioning guidance_scale, prompt conditioning guidance ?scale)."""

Are you using some sort of controlnet?

Thanks

vatsalaggarwal commented 7 months ago

@G-force78

similarity is the guidance_scale value, particularly the first item in that tuple. The second item in the tuple isn't relevant for the currently released model, so I would keep the value at 1.0
stability is the top_p value

You can find code to convert between these values here: https://github.com/metavoiceio/metavoice-src/blob/main/app.py#L29-L36

G-force78 commented 7 months ago

Ok thanks for that. I've noticed youve changed sample.py to fastinference.py however the new script doesnt produce any audio outputs, not that I can find anyway.

vatsalaggarwal commented 7 months ago

@sidroopdaska

sidroopdaska commented 7 months ago

Hey @G-force78, based on your stack trace above it looks like you haven't run the synthesise() API?

You'll need to run both below

# sets up the model
python -i fam/llm/fast_inference.py 

# runs synthesise. The outputs get stored under the `output/` directory at the root level of the repo. There is also a print statement that shares the output path
tts.synthesise(text="This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model.", spk_ref_path="assets/bria.mp3")

G-force78 commented 7 months ago

Hey @G-force78, based on your stack trace above it looks like you haven't run the synthesise() API?

You'll need to run both below

# sets up the model
python -i fam/llm/fast_inference.py 

# runs synthesise. The outputs get stored under the `output/` directory at the root level of the repo. There is also a print statement that shares the output path
tts.synthesise(text="This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model.", spk_ref_path="assets/bria.mp3")

Ok thanks I'm testing it in google colab so will probably need some adjustments, does the app do fast inference too? I will just use that if so.

Another question is if this stores the latents (not sure if correct terminology) of each result somewhere so it can be reused?

vatsalaggarwal commented 7 months ago

Ok thanks I'm testing it in google colab so will probably need some adjustments

Ok please let me know if you have any problems

does the app do fast inference too?

Yes

Another question is if this stores the latents (not sure if correct terminology) of each result somewhere so it can be reused?

Yes these get cached to disk

G-force78 commented 7 months ago

I have it working now but on t4 there doesnt seem to be an increase in speed however I havent looked at the exact time it took. Where can I set the cache path so I can keep the latents?

vatsalaggarwal commented 7 months ago

an increase in speed however I havent looked at the exact time it took.

Yeah, it's possible that T4 is too slow (compute or memory bandwidth wise) for our speedups (inspired by gpt-fast) to not matter. Our speedups mainly relate to: i) getting rid of CPU overhead (because other GPUs compute faster than CPU can schedule ops), ii) doing triton compilation via torch.compile so ops get fused...

Maybe the simplest thing to improve speeds on T4 is using int-8? There is some code for this in gpt-fast, and I reckon it should be possible to apply that here.

Where can I set the cache path so I can keep the latents?

It defaults to ~/.cache (Ref: https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/inference.py#L392-L435 )... you can make changes here if you want to change the path.