Open haydonryan opened 1 month ago
I also wonder if we should consider having two files - an included on that had been throughly vetted and custom replacements.
Hey @haydonryan . Thanks for reaching out. Not sure if the word replacement you mentioned would be something similar with this PR https://github.com/p0n1/epub_to_audiobook/pull/80 we have merged.
There are a bunch of clear ones based on the books I read. $1 million reads as dollar one million 2010 reads as two thousand ten.
Besides, just curious about which TTS engine are you using?
Oh yes good point - it's definitely going to be specific to the TTS engine. I'm currently using piper, but have been thinking about trying https://github.com/coqui-ai/TTS, but as this isn't currently a supported option, I'd export the text files before passing it on.
Still looking for the best free TTS system. I like piper but the lack of GPU acceleration is frustrating.
So the readme is helpful - but what regular expression syntax is it using? Eg in my script to run epub_to_audiobook I have:
# numbers will be in the form:
# 19 20 or 19o4
ls *.txt | xargs sed -i 's/2000/two thousand/g'
ls *.txt | xargs sed -i 's/200\([1-9]\)/two thousand and \1/g'
ls *.txt | xargs sed -i 's/\([0-9]\{2\}\)0\([0-9]\)/\1o\2/g'
ls *.txt | xargs sed -i 's/\([0-9]\{2\}\)\([0-9]\{2\}\)/\1 \2/g'
and some involve punctuation eg:
ls *.txt | xargs sed -i 's/Jr.’s/juniors/g'
I dug into the code. seems it's calling re.sub. Therefore python regex format is the one it's doing.
# Search and replace from books I'm listening to:
\$([0-9]+.[0-9])\sbillion==\1 billion dollars
This as a search and replace file didn't work.
however this did:
import re
test="$70 billion"
re.sub(r"\$([0-9]+) billion", r"\1 billion dollars", test)
e.sub(r"\$([0-9]+.*[0-9]*)\sbillion", r"\1 billion dollars", test)
'70.3 billion dollars'
Also I don't think it should be one regex per line, it's highly lkely that you'll get more than one match -
eg:
Carls Jr spent $3.1 Billion on advertising. Has two items that would not get spoken right...
Better to run the search and replace over the whole file.
'70.3 billion dollars'
Why would this lead to '70.3 billion dollars'?
sorry i fudged the example, the example above would be 70 billion dollars
Loving this app. Thankyou all for the great work.
It would be good to crowd source some of the word replacements
There are a bunch of clear ones based on the books I read. $1 million reads as dollar one million 2010 reads as two thousand ten.
I'm currently doing these changes on the command line. Happy to contribute mines just need to confirm the format.