p0n1 / epub_to_audiobook

EPUB to audiobook converter, optimized for Audiobookshelf
MIT License
1.04k stars 107 forks source link

[Feature Request] Implement rough cost calculation beforehand, with prompt to confirm. #21

Open Bryksin opened 10 months ago

Bryksin commented 10 months ago

Hi

I was in the middle of writing my solution when by accident came across this project which already has almost everything implemented So I'm planning to use your solution!

Thank you for your work!!!

However, what is missing - is cost estimation. When I want to convert a book to Audio I have no idea how big is it and how much would it cost Would be nice if every tts_provider would implement a cost estimation function, and calculate roughly how much would it cost to translate the selected book

With manual command line prompt to confirm before final translation, like:

The approximate cost of the book voiceover would be XYZ$ 
Would you agree to proceed? [Y/N]: _

For example, OpenAI set the price of 0.015$ for 1k chars for the simple tts model and doubled it to 0.03$ for the tts-hd model It should be easy to calculate by the formula: (whole_book_chars / 1k) * selected_tts_model_price

Additional suggestions: Considering project evolution and further progress, I would suggest:

  1. Reorganise the project from a single file into proper separate classes and packages and move TTS providers and the main interface TTSProvider into a separate Python package to simplify adding more providers
  2. Add the cost_estimation method to the TTSProvider interface
  3. Add more book type support ( *.fb2, *.mobi...) which would require also the creation of separate services implementing a global interface for each book type
  4. Add more providers:
    • AWS has TTS - called Polly. supports: standard (mechanical) voice and new neural voice (sounds much better), but not all languages are supported (what makes --language to be an obligatory arg for execution). Price
    • Google has TTS Price
    • I'm sure there are many more providers out there, especially considering the AI boom in the industry, however, if anyone decides to contribute, he would have to implement TTSProvider interface with basic the standard functionality and place it into an individual Python package.

P.S. Happy to help with the project, feel free to PM

p0n1 commented 10 months ago

Thank you for very detailed and valuable feedbacks. Those are all great suggestions/ideas. I also have some of them in mind but never documented.

cost estimation cost_estimation method to TTSProvider

Yes, I like this idea because many people concern about how much it will take. This is great and not difficult to add.

Reorganise the project

Yes, I thought about this in the last refactor but planed to do like this in the next refactor when more providers added.

More book type

I almost only use epub files but more book type will definitely be useful for more people.

more providers

Yes. Many users asked to support other TTS providers and I would add them one by one though I have my favorites.

At first, I have a strong personal demand in this tool because I listen to audiobooks every day. So I would update/develop it with KISS principle in mind when I found something I need to improve or implement.

Now, I'm glad to see many people having similar demand and interest in this project and I'm willing to take time to make this tool more useable for many others.

I am very welcoming and open to Pull Requests. Would be very happy to help test new features and review code. Whether it's about refactoring the project, fixing bugs, or implementing new features. Just try not to break the existing command-line interface parameters, as this might cause confusion for the users.

marchowardbegins commented 10 months ago

Hi

I was in the middle of writing my solution when by accident came across this project which already has almost everything implemented So I'm planning to use your solution!

Thank you for your work!!!

However, what is missing - is cost estimation. When I want to convert a book to Audio I have no idea how big is it and how much would it cost Would be nice if every tts_provider would implement a cost estimation function, and calculate roughly how much would it cost to translate the selected book

With manual command line prompt to confirm before final translation, like:

The approximate cost of the book voiceover would be XYZ$ 
Would you agree to proceed? [Y/N]: _

For example, OpenAI set the price of 0.015$ for 1k chars for the simple tts model and doubled it to 0.03$ for the tts-hd model It should be easy to calculate by the formula: (whole_book_chars / 1k) * selected_tts_model_price

Additional suggestions: Considering project evolution and further progress, I would suggest:

  1. Reorganise the project from a single file into proper separate classes and packages and move TTS providers and the main interface TTSProvider into a separate Python package to simplify adding more providers
  2. Add the cost_estimation method to the TTSProvider interface
  3. Add more book type support ( *.fb2, *.mobi...) which would require also the creation of separate services implementing a global interface for each book type
  4. Add more providers:

    • AWS has TTS - called Polly. supports: standard (mechanical) voice and new neural voice (sounds much better), but not all languages are supported (what makes --language to be an obligatory arg for execution). Price
    • Google has TTS Price
    • I'm sure there are many more providers out there, especially considering the AI boom in the industry, however, if anyone decides to contribute, he would have to implement TTSProvider interface with basic the standard functionality and place it into an individual Python package.

P.S. Happy to help with the project, feel free to PM

I LOVE this feature @Bryksin !!! Happy to help test once its implemented.

Bryksin commented 10 months ago

Already working on it... Started yesterday night, so far only project refactoring to prepare it for scalability

And then will be the actual feature implementation. So im expecting to make at least 2 PRs

Bryksin commented 10 months ago

Hey @p0n1 , just on that one:

Just try not to break the existing command-line interface parameters, as this might cause confusion for the users.

I do understand your concerns, and that's why want to discuss specifically this bit with you. As we understand the project will grow and that means that input args will also, therefore I think or I suggest still making few little changes to optimise and reduce the number of different args in case those args can be merged in common

here are just a few of them:

Additionally, I was thinking that different TTS providers might require their own args combination, therefore every TTS provider (possibly even in the interface) should implement the method validate_config which will be called directly from the constructor and validate if configs are correct

example: if we merge --output_format and --openai_format - then for each of those TTS providers values are different, For Azure: audio-24khz-48kbitrate-mono-mp3 for OpenAI: mp3 Therefore each TTS Provider should validate its own config and make sure that values used in args are directly supported in that specific chosen TTS provider

So basically need your approval to make these changes or just keep as it right now

p0n1 commented 10 months ago

Hi @Bryksin. I thought about this before when I was integrating OpenAI and chose a simple solution like adding openai_ prefix to each parameter to avoid name conflict. The benefit is that I can conveniently use the default feature of argparse to set default values for each parameter, and display the usage of the parameters in each argument group. It also allows for more flexible handling and accurate mapping of future special parameters for various TTS providers. However, its drawback is that it adds more parameters.

It seems that the voice_name and output_format you mentioned are indeed commonly used by most TTS providers. I'm not sure about the model_name. Look like there is a similar engine parameter in Polly but nothing I found in Google TTS.

Nevertheless, I support merge common arguments but we should add extra logic for assigning default values for different TTS. Also, document in help argument mapping to TTS official doc/API in case of different naming.

The validate_config also makes sense to me.

Bryksin commented 10 months ago

Hey @p0n1

Just pushed the changes to mine forked branch Unfortunatly no time to finish it right now, I will be unavailable for the next 4 days, but would be nice if you could start reviewing it and comment about changes you willing me to add before I open PR

p0n1 commented 10 months ago

Great work @Bryksin. Will take a closer look at it ASAP.

Bryksin commented 10 months ago

Hey @p0n1 , I'm back to PC :D PR was opened