pndurette / gTTS

Python library and CLI tool to interface with Google Translate's text-to-speech API
http://gtts.readthedocs.org/
MIT License
2.24k stars 358 forks source link

converting a longer pdf - of say 7 - 8 pages #297

Open vivekna opened 3 years ago

vivekna commented 3 years ago

It works great for smaller chunk of texts. But if I try to convert pdfs like a 7 page pdf it always fails after a very long wait. it would run for hours and fail, wanted to know if this is meant only for smaller texts? what's the alternative??

The error would usually be a connection error.,, but that's expected if it runs for few hours to convert a seven page pdf right??

gtts.tts.gTTSError: 500 (Internal Server Error) from TTS API. Probable cause: Uptream API error. Try again later.

My code is very simple and straight-forward :

import pdftotext
from gtts import gTTS
from os.path import splitext

filelocation = "C:\\Users\\vna\\Downloads\\catch22.pdf"
with open(filelocation, "rb") as f:  # open the file in reading (rb) mode and call it f
    pdf = pdftotext.PDF(f)  # store a text version of the pdf file f in pdf variable

string_of_text = ''
for text in pdf:
    string_of_text += text

final_file = gTTS(text=string_of_text, lang='en')  # store file in variable
outname = splitext(filelocation)[0] + '.mp3'
final_file.save(outname)  # save file to computer
pndurette commented 3 years ago

Hi there!

Hmm, for a 7 page PDF, I'd say the cause is indeed because you're requesting a lot and eventually the server shuts you down after too many quick requests.

Unfortunately there's not any way to tell gTTS to 'slow down' the requests. So I'll add this as an enhancement.

In the meantime, if that's what's actually happening, you'll have to make separate requests with a slight sleep between them.

vivekna commented 3 years ago

Yes, thats what is happening here... Thanks for the enhancement! Do you know how we can make separate requests? probably split by page ? and will it inturn create separate mp3 files? though i can do something to join all together later.. but is there a known approach? have u tried to convert such docs?

Also wish to add, the outcome of gTTS is way better than pystxx3 . . Thanks for your effort! :)

ickam commented 3 years ago

A related question @pndurette : would using my own API key created in google cloud shell allow me to process longer files? If so, how do I feed it to the app if I installed it through pip?