Open corneliusroemer opened 2 months ago
Sped up voices have pretty terrible ring to it (or am I just imagining this?). At least the 1.1 has this issue, not sure about 1.01.
Maybe this is an issue with the upstream OpenAI models?
Here are various speeds, created like this:
ospeak "Which voice do you prefer?" -v shimmer -m tts-1-hd -x 1.1 -o 11.wav
These were converted with ffmpeg -i 1.wav 1.mp4 so I can upload to Github issue (click to open in browser audio player)
ffmpeg -i 1.wav 1.mp4
1x speed: https://github.com/user-attachments/assets/d3b1cf69-2a56-40aa-a59f-61bb814f4478
1.01x speed: https://github.com/user-attachments/assets/11a6be73-4c80-490e-8005-6b983cb5a770
1.1x speed: https://github.com/user-attachments/assets/552e7882-d906-4525-88d7-e2118788b6aa
original wavs in zip folder: Archive.zip
I get much much better results by speeding up manually with ffmpg instead of using the open ai speed setting.
ffmpeg -i 1.wav -filter:a "atempo=1.1" 11_manual.wav
11_manual.wav.zip
Sped up voices have pretty terrible ring to it (or am I just imagining this?). At least the 1.1 has this issue, not sure about 1.01.
Maybe this is an issue with the upstream OpenAI models?
Here are various speeds, created like this:
These were converted with
ffmpeg -i 1.wav 1.mp4
so I can upload to Github issue (click to open in browser audio player)1x speed: https://github.com/user-attachments/assets/d3b1cf69-2a56-40aa-a59f-61bb814f4478
1.01x speed: https://github.com/user-attachments/assets/11a6be73-4c80-490e-8005-6b983cb5a770
1.1x speed: https://github.com/user-attachments/assets/552e7882-d906-4525-88d7-e2118788b6aa
original wavs in zip folder: Archive.zip