Open medanisjbara opened 2 years ago
I assume this isn't gtts-cli
's fault. Since there's no actual python error. So I assume the problem is actually with the google text to speech engine. Yet the behavior itself is confusing. So I hope a fix will be applied.
@medanisjbara Thanks a lot for this well documented behaviour!
Hmm, so it's a windows-1252
character. I wonder if there's anything gTTS
should (or shouldn't do) about this, like applying some filtering. I'll have to take a look with the debugging on.
Prerequisites
pip install --upgrade gTTS
)Current Behaviour (steps to reproduce)
The presence of
0xA0
in the input text is mostly ignored bygtts-cli
. But in certain situations (the provided example) It will produceError: 200 (OK) from TTS API. Probable cause: No audio stream in response. Unsupported language 'en'
along with EOF (And it seems to be redirected to stderr without actually having a python error).working_test.txt non_working_test.txt Even though the files contain
0xA0
which I assumed it will make the file a binary file. Thefile
command says the opposite.gtts-cli
didn't complain about none UTF-8 characters. And usingiconv
to remove non utf-8 characters doesn't change anything.$ iconv -f utf-8 -t utf-8 -c test
does nothing to the file. And some web pages use that character in between the text. Most text editors show it as space. Which is a bit frustrating to the user (You almost have no clue what to do or what causes the error) And I can not blame the creator of the page since it seems like (after searching online)0xA0
is a part ofwindows-1252
encoding (So if he wrote his blog in microsoft word, there's a big chance it got introduced there).Expected Behaviour
gtts-cli should ignore that character and continue reading regardless of how and where it is present.
Context
I am writing a simple bash script that reads aloud the user's clipboard or a webpage associated with the url in the user's clipboard.
I personally have been using this command
w3m "$(xclip -o)" | gtts-cli -f - | mpv -
for over a year to boost productivity when reading. With some variations suchless $pdf_file_or_epub_file | gtts-cli -f - | mpv -
and so on and so forth.The script basically does the same (Still very basic and under development).
And I came accross some webpages that caused that error to occure. After Some investigations I found out that the character
0xA0
is what is causing the problem.So I created an issue and made a small workaround that uses
bbe
to replace the bad character with none (and theniconv
for clean up since it is messing up a couple of things).Environment