Sped up voices have pretty terrible ring to it (or am I just imagining this?) - Githubissues

simonw / ospeak

CLI tool for running text through OpenAI Text to speech

Apache License 2.0

162 stars 10 forks source link

Sped up voices have pretty terrible ring to it (or am I just imagining this?) #19

Open corneliusroemer opened 2 months ago

corneliusroemer commented 2 months ago

Sped up voices have pretty terrible ring to it (or am I just imagining this?). At least the 1.1 has this issue, not sure about 1.01.

Maybe this is an issue with the upstream OpenAI models?

Here are various speeds, created like this:

ospeak "Which voice do you prefer?" -v shimmer -m tts-1-hd -x 1.1 -o 11.wav

These were converted with ffmpeg -i 1.wav 1.mp4 so I can upload to Github issue (click to open in browser audio player)

1x speed: https://github.com/user-attachments/assets/d3b1cf69-2a56-40aa-a59f-61bb814f4478

1.01x speed: https://github.com/user-attachments/assets/11a6be73-4c80-490e-8005-6b983cb5a770

1.1x speed: https://github.com/user-attachments/assets/552e7882-d906-4525-88d7-e2118788b6aa

original wavs in zip folder: Archive.zip

corneliusroemer commented 2 months ago

I get much much better results by speeding up manually with ffmpg instead of using the open ai speed setting.

ffmpeg -i 1.wav  -filter:a "atempo=1.1" 11_manual.wav

11_manual.wav.zip