yuval-reshef / StreamVC

An unofficial pytorch implementation of "STREAMVC: REAL-TIME LOW-LATENCY VOICE CONVERSION".
MIT License
54 stars 7 forks source link

Pre-trained model #1

Closed EmreOzkose closed 3 months ago

EmreOzkose commented 3 months ago

Hi, congrats to reproducing this work.

I have a few questions.

  1. Do you have a plan to share pre-trained models?
  2. Is the model capable of converting between different languages?
  3. How much audio data do we need to convert like 5 minutes, 10 minutes, etc..?

Thank you in advance :)

yuval-reshef commented 3 months ago

Hi.

  1. I currently don't have the resources to train the model properly. This can become pretty expensive, since the hyper parameters were not shared in the paper. We did however test it and saw it can be trained. If someone is willing to put in the work and money, I'll be happy to share their per-trained model instance in this repo. Otherwise, I might do it in the future but can't commit on a timeline.
  2. Without a fully trained model, I can't answer this question.
  3. 5-10 seconds for target speech is usually enough going by the paper's results page (https://google-research.github.io/seanet/stream_vc/)
hermanseu commented 4 days ago

Hi @yuval-reshef What is the accuracy of your content-encoder model?I can just get 47%, based on the context-encoder, I tained the left model, and get extremely poor performance. Can you share the per-trained model to have a try?