Open Bryksin opened 1 year ago
Thank you for very detailed and valuable feedbacks. Those are all great suggestions/ideas. I also have some of them in mind but never documented.
cost estimation cost_estimation method to TTSProvider
Yes, I like this idea because many people concern about how much it will take. This is great and not difficult to add.
Reorganise the project
Yes, I thought about this in the last refactor but planed to do like this in the next refactor when more providers added.
More book type
I almost only use epub files but more book type will definitely be useful for more people.
more providers
Yes. Many users asked to support other TTS providers and I would add them one by one though I have my favorites.
At first, I have a strong personal demand in this tool because I listen to audiobooks every day. So I would update/develop it with KISS principle in mind when I found something I need to improve or implement.
Now, I'm glad to see many people having similar demand and interest in this project and I'm willing to take time to make this tool more useable for many others.
I am very welcoming and open to Pull Requests. Would be very happy to help test new features and review code. Whether it's about refactoring the project, fixing bugs, or implementing new features. Just try not to break the existing command-line interface parameters, as this might cause confusion for the users.
Hi
I was in the middle of writing my solution when by accident came across this project which already has almost everything implemented So I'm planning to use your solution!
Thank you for your work!!!
However, what is missing - is cost estimation. When I want to convert a book to Audio I have no idea how big is it and how much would it cost Would be nice if every
tts_provider
would implement a cost estimation function, and calculate roughly how much would it cost to translate the selected bookWith manual command line prompt to confirm before final translation, like:
The approximate cost of the book voiceover would be XYZ$ Would you agree to proceed? [Y/N]: _
For example, OpenAI set the price of 0.015$ for 1k chars for the simple
tts
model and doubled it to 0.03$ for thetts-hd
model It should be easy to calculate by the formula:(whole_book_chars / 1k) * selected_tts_model_price
Additional suggestions: Considering project evolution and further progress, I would suggest:
- Reorganise the project from a single file into proper separate classes and packages and move TTS providers and the main interface
TTSProvider
into a separate Python package to simplify adding more providers- Add the
cost_estimation
method to theTTSProvider
interface- Add more book type support (
*.fb2
,*.mobi
...) which would require also the creation of separate services implementing a global interface for each book typeAdd more providers:
- AWS has TTS - called
Polly
. supports: standard (mechanical) voice and new neural voice (sounds much better), but not all languages are supported (what makes--language
to be an obligatory arg for execution). Price- Google has TTS Price
- I'm sure there are many more providers out there, especially considering the AI boom in the industry, however, if anyone decides to contribute, he would have to implement
TTSProvider
interface with basic the standard functionality and place it into an individual Python package.P.S. Happy to help with the project, feel free to PM
I LOVE this feature @Bryksin !!! Happy to help test once its implemented.
Already working on it... Started yesterday night, so far only project refactoring to prepare it for scalability
And then will be the actual feature implementation. So im expecting to make at least 2 PRs
Hey @p0n1 , just on that one:
Just try not to break the existing command-line interface parameters, as this might cause confusion for the users.
I do understand your concerns, and that's why want to discuss specifically this bit with you. As we understand the project will grow and that means that input args will also, therefore I think or I suggest still making few little changes to optimise and reduce the number of different args in case those args can be merged in common
here are just a few of them:
--voice_name
for Azure and --openai_voice
for open ai - can be merged and reuse the same prop --voice_name
--output_format
for Azure and --openai_format
for open ai - the same can be merged into --output_format
--openai_model
eve though there are no equivalents in Azure - there are in AWS and Google and I'm sure in other AI tools, so I would suggest making it generic and renaming it to model_name
so then it can be shared with all TTS providers.Additionally, I was thinking that different TTS providers might require their own args combination, therefore every TTS provider (possibly even in the interface) should implement the method validate_config
which will be called directly from the constructor
and validate if configs are correct
example: if we merge --output_format
and --openai_format
- then for each of those TTS providers values are different,
For Azure: audio-24khz-48kbitrate-mono-mp3
for OpenAI: mp3
Therefore each TTS Provider should validate its own config and make sure that values used in args are directly supported in that specific chosen TTS provider
So basically need your approval to make these changes or just keep as it right now
Hi @Bryksin. I thought about this before when I was integrating OpenAI and chose a simple solution like adding openai_
prefix to each parameter to avoid name conflict. The benefit is that I can conveniently use the default feature of argparse to set default values for each parameter, and display the usage of the parameters in each argument group. It also allows for more flexible handling and accurate mapping of future special parameters for various TTS providers. However, its drawback is that it adds more parameters.
It seems that the voice_name
and output_format
you mentioned are indeed commonly used by most TTS providers. I'm not sure about the model_name
. Look like there is a similar engine
parameter in Polly but nothing I found in Google TTS.
Nevertheless, I support merge common arguments but we should add extra logic for assigning default values for different TTS. Also, document in help argument mapping to TTS official doc/API in case of different naming.
The validate_config
also makes sense to me.
Hey @p0n1
Just pushed the changes to mine forked branch Unfortunatly no time to finish it right now, I will be unavailable for the next 4 days, but would be nice if you could start reviewing it and comment about changes you willing me to add before I open PR
Great work @Bryksin. Will take a closer look at it ASAP.
Hey @p0n1 , I'm back to PC :D PR was opened
Hi
I was in the middle of writing my solution when by accident came across this project which already has almost everything implemented So I'm planning to use your solution!
Thank you for your work!!!
However, what is missing - is cost estimation. When I want to convert a book to Audio I have no idea how big is it and how much would it cost Would be nice if every
tts_provider
would implement a cost estimation function, and calculate roughly how much would it cost to translate the selected bookWith manual command line prompt to confirm before final translation, like:
For example, OpenAI set the price of 0.015$ for 1k chars for the simple
tts
model and doubled it to 0.03$ for thetts-hd
model It should be easy to calculate by the formula:(whole_book_chars / 1k) * selected_tts_model_price
Additional suggestions: Considering project evolution and further progress, I would suggest:
TTSProvider
into a separate Python package to simplify adding more providerscost_estimation
method to theTTSProvider
interface*.fb2
,*.mobi
...) which would require also the creation of separate services implementing a global interface for each book typePolly
. supports: standard (mechanical) voice and new neural voice (sounds much better), but not all languages are supported (what makes--language
to be an obligatory arg for execution). PriceTTSProvider
interface with basic the standard functionality and place it into an individual Python package.P.S. Happy to help with the project, feel free to PM