Closed G-force78 closed 7 months ago
@G-force78
guidance_scale
value, particularly the first item in that tuple. The second item in the tuple isn't relevant for the currently released model, so I would keep the value at 1.0top_p
valueYou can find code to convert between these values here: https://github.com/metavoiceio/metavoice-src/blob/main/app.py#L29-L36
Ok thanks for that. I've noticed youve changed sample.py to fastinference.py however the new script doesnt produce any audio outputs, not that I can find anyway.
2024-02-26 11:38:47 | INFO | DF | Running on torch 2.2.1+cu121
2024-02-26 11:38:47 | INFO | DF | Running on host 855f76734407
fatal: not a git repository (or any of the parent directories): .git
2024-02-26 11:38:47 | INFO | DF | Loading model settings of DeepFilterNet3
2024-02-26 11:38:47 | INFO | DF | Using DeepFilterNet3 model at /root/.cache/DeepFilterNet/DeepFilterNet3
2024-02-26 11:38:47 | INFO | DF | Initializing model deepfilternet3
2024-02-26 11:38:47 | INFO | DF | Found checkpoint /root/.cache/DeepFilterNet/DeepFilterNet3/checkpoints/model_120.ckpt.best with epoch 120
2024-02-26 11:38:47 | INFO | DF | Running on device cuda:0
2024-02-26 11:38:47 | INFO | DF | Model loaded
Using device=cuda
Loading model ...
using dtype=float16
Time to load model: 19.44 seconds
Compiling...Can take up to 2 mins.
100% 199/199 [00:27<00:00, 7.18it/s]
Compilation time: 51.38 seconds
@sidroopdaska
Hey @G-force78, based on your stack trace above it looks like you haven't run the synthesise() API?
You'll need to run both below
# sets up the model
python -i fam/llm/fast_inference.py
# runs synthesise. The outputs get stored under the `output/` directory at the root level of the repo. There is also a print statement that shares the output path
tts.synthesise(text="This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model.", spk_ref_path="assets/bria.mp3")
Hey @G-force78, based on your stack trace above it looks like you haven't run the synthesise() API?
You'll need to run both below
# sets up the model python -i fam/llm/fast_inference.py # runs synthesise. The outputs get stored under the `output/` directory at the root level of the repo. There is also a print statement that shares the output path tts.synthesise(text="This is a demo of text to speech by MetaVoice-1B, an open-source foundational audio model.", spk_ref_path="assets/bria.mp3")
Ok thanks I'm testing it in google colab so will probably need some adjustments, does the app do fast inference too? I will just use that if so.
Another question is if this stores the latents (not sure if correct terminology) of each result somewhere so it can be reused?
Ok thanks I'm testing it in google colab so will probably need some adjustments
Ok please let me know if you have any problems
does the app do fast inference too?
Yes
Another question is if this stores the latents (not sure if correct terminology) of each result somewhere so it can be reused?
Yes these get cached to disk
I have it working now but on t4 there doesnt seem to be an increase in speed however I havent looked at the exact time it took. Where can I set the cache path so I can keep the latents?
an increase in speed however I havent looked at the exact time it took.
Yeah, it's possible that T4 is too slow (compute or memory bandwidth wise) for our speedups (inspired by gpt-fast) to not matter. Our speedups mainly relate to: i) getting rid of CPU overhead (because other GPUs compute faster than CPU can schedule ops), ii) doing triton compilation via torch.compile so ops get fused...
Maybe the simplest thing to improve speeds on T4 is using int-8? There is some code for this in gpt-fast, and I reckon it should be possible to apply that here.
Where can I set the cache path so I can keep the latents?
It defaults to ~/.cache
(Ref: https://github.com/metavoiceio/metavoice-src/blob/main/fam/llm/inference.py#L392-L435 )... you can make changes here if you want to change the path.
Hi, great implementation Im impressed by the accuracy of one shot looking forward to the finetune training code released. In the meantime could you tell me how to change similarity and stability in sampling.py? What does it relate to? I am thinking top P and top K? Or guidance_scale: Optional[Tuple[float, float]] = (3.0, 1.0) """Guidance scale for sampling: (speaker conditioning guidance_scale, prompt conditioning guidance ?scale)."""
Are you using some sort of controlnet?
Thanks