ulagbulag / rustts

RusTTS is an unofficial Coqui TTS implementation.
Mozilla Public License 2.0
21 stars 2 forks source link

Question about usage #4

Open The-Mr-L opened 2 years ago

The-Mr-L commented 2 years ago

well this might be a stupid question but why do I get a male voice when I replace the english woman's samples with the one from Coqui TTS - Double Decoder Consistency v2 Samples. just the samples downloaded from the sample page here https://erogol.github.io/ddc-samples/

:)

HoKim98 commented 2 years ago

What kind of example have you tested? (maybe either TTS or VC)

The-Mr-L commented 2 years ago

sorry yes tts.

The-Mr-L commented 2 years ago

I just download some of the samples and replaced them with the existing samples for the eng female. but this might be too naive?

HoKim98 commented 2 years ago

Could you take the command below?

# Argument 1 => String to be written
# Argument 2 => Speaker Assets (directory)
cargo run --example tts -- "Hello Mr my yesterday" "./assets/samples/speaker_woman_english/"
HoKim98 commented 2 years ago

And, please be note that if you replace the samples in the specific directory, then the original ones should be removed.

The-Mr-L commented 2 years ago

that is what I have done. and I ofc removed the original samples. the code I ran is

pub fn speak(text: &str) -> Result<()> {
    // Load a Model
    let model_path = "./models/tts/vits";
    let speaker_encoder_path = "./models/tts/speaker_encoder.pt";
    let tts = rustts::TTS::try_default(model_path, speaker_encoder_path)?;

    // Command-line Arguments

    let speaker_emb = "./models/tts/samples/speaker_woman_english";

    // Parameters
    let text = text;
    let speaker_emb = tts.embed(&rustts::utils::audio::get_wav_files(&speaker_emb)?)?;
    let options = SynthesisOptions {
        length_scale: 1.0,
        ..Default::default()
    };

    // Forward
    let ref_wav_voc = tts.synthesis(&text, &speaker_emb, &options)?;

    // Save to .wav file
    rustts::utils::audio::save_wav(ref_wav_voc, "output-tts.wav")?;
    Ok(())
}
The-Mr-L commented 2 years ago

btw the model etc is just copied from the asset folder

The-Mr-L commented 2 years ago

cargo run --example tts -- "Hello Mr my yesterday" "./assets/samples/speaker_woman_english/" just gave the same male voice. when using the new samples from the sample page above, and they are female ofc.

The-Mr-L commented 2 years ago

so I just tried with some samples from https://github.com/edresson/yourtts and it works, so it is properly the format of the wav file that is not right. don't know yet.

HoKim98 commented 2 years ago

Thanks for the feedback! I'll leave the progress after experimenting in several different clean environments.

HoKim98 commented 2 years ago

Sorry to keep you waiting so long.

As I tested, the command below succeeded in TTS operation with a female voice without any problem.

cargo run --example tts -- "Hello Mr my yesterday" "./assets/samples/speaker_woman_english/"

So, unfortunately I haven't been able to reproduce your problem. If you have some time, could you please execute the commands in the order below?

git clone https://github.com/ulagbulag-village/rustts.git
cd rustts

cargo run --example tts -- "Hello Mr my yesterday" "./assets/samples/speaker_woman_english/"
The-Mr-L commented 2 years ago

np :) well the example works , but as I said if I replace the speaker samples in the example some other female wav samples then I got male voice. that said as I mentioned in the last comment I get it working when I used samples from https://github.com/edresson/yourtts .

but https://erogol.github.io/ddc-samples/ doss not work . might be the format I am not sure.

HoKim98 commented 2 years ago

We currently only support wav files with 16,000 sample rate and 16 BPS on a single channel.

Could you mind sending me some wav samples to debug? It seems difficult to understand the specific situation since I'm not looking at the actual samples. ( via download link on public, or e-mail me )

The-Mr-L commented 2 years ago

well sure all ssamples I was using are at the link above :) sample like this one https://erogol.github.io/ddc-samples/wavs/s3.wav