I used whisper to recognize a audio with a person saying:
By using the number 100,000 (pronounced as one hundred thousand in audio) as substitution, we got a number of 10,000 (pronounced ten thousand in the audio).
Then I plug the recognized text to melo tts. It says:
By using the number 100,000 (pronounced as one hundred, zero, zero,zero) as substituion, we got a number of 10,000 (pronounced ten, zero,zero,zero).
What it should say is one hundred thousand for 100,000 and ten thousand for 10,000. But instead it pronounced it as if it was like 100, 000 as a number and another number seperated by a comma.
I used whisper to recognize a audio with a person saying: By using the number 100,000 (pronounced as one hundred thousand in audio) as substitution, we got a number of 10,000 (pronounced ten thousand in the audio).
Then I plug the recognized text to melo tts. It says: By using the number 100,000 (pronounced as one hundred, zero, zero,zero) as substituion, we got a number of 10,000 (pronounced ten, zero,zero,zero).
What it should say is one hundred thousand for 100,000 and ten thousand for 10,000. But instead it pronounced it as if it was like 100, 000 as a number and another number seperated by a comma.
How do I fix this?