nidhaloff / deep-translator

A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.
https://deep-translator.readthedocs.io/en/latest/?badge=latest
Apache License 2.0
1.61k stars 185 forks source link

Not able to request with more than 1800 characters #224

Open rupeshkumaar opened 1 year ago

rupeshkumaar commented 1 year ago

I was trying to request a document text that was in Chinese but I was not able to send a request with more than 1800 characters. Though it says it has 5k character limit. I am getting RequestError with 400 status code. I have latest version of deep-translator.

nidhaloff commented 1 year ago

Can you post the output?

rupeshkumaar commented 1 year ago

@nidhaloff yeah sure, I am posting the scenarios I tried. I cannot share the document but I am using one of the chinese characters and then replicating it for n number of times just for the argument's sake.

from deep_translator import GoogleTranslator
import sys

# keeping the char size limited to 1800
content="""学"""
content *= 1800
len(content)
>> 1800
sys.getsizeof(content)
>> 3674

# # Using GoogleTranslator

translated = GoogleTranslator(source='auto', target='en').translate(content)
translated
'study study study study study studying studying studying studying studying studying studying studying studying studying studying studying studying scholastic scholastic scholastic scholastic scholastic scholastic scholastic scholastic scholastic scholastic'

Now everytime I got the same result that I am posting at the end of this snippet

# keeping the char size limited to 1900
content="""学"""
content *= 1900

len(content)
>> 1900
sys.getsizeof(content)
>> 3874
# # Using GoogleTranslator
translated = GoogleTranslator(source='auto', target='en').translate(content)

# keeping the char size limited to 4999
content="""学"""
content *= 4999

len(content)
>> 4999
sys.getsizeof(content)
>> 10072
# # Using GoogleTranslator
translated = GoogleTranslator(source='auto', target='en').translate(content)

# I got the below error 

deep_translator.exceptions.RequestError: Request exception can happen due to an api connection error. Please check your connection and try again
# keeping the char size limited to 5000
content="""学"""
content *= 5000

len(content)
>> 5000
sys.getsizeof(content)
>> 10074
# # Using GoogleTranslator
translated = GoogleTranslator(source='auto', target='en').translate(content)

and for the 5000 characters I got the deep_translator.exceptions.NotValidLength error which was expected. But I think it should be for 5001st character and not for 5000th character. Please guide me if I am wrong.

And for the above issue I have gone through various articles and posts and stackoverflows questions and I found out that we are using GET method which results to the character limit of 2k though I am not able to achieve the result for 2k characters but that is what I found and for the POST method the character limit is 5k. So, maybe that could be the reason of capping it upto 2k. But, I am not sure. Please guide me if I am wrong. (source:[https://stackoverflow.com/questions/18754905/google-translate-api-cannot-send-more-than-2000-characters-per-request])

nidhaloff commented 1 year ago

@rupeshkumaar Hm I didn't know about the GET request limitation. Can you hack up and test that using post?

rupeshkumaar commented 1 year ago

@nidhaloff I had tried using post but earlier I was getting 411, it needed Content-length as the request header, and after that I started getting 405. So, I was not able to work it out. But I tried another module translators and under the hood it was using the 5k characters limit. But only drawback I found was if the limit is exhausted or somehow you got 429 then you are done for the day. So, it didn't fulfill my requirement. deep-translator could, so I am currently using deep-translator with 1800 characters limit but if you could look into this and work it out somehow. Hope this helps.