sloria / TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
https://textblob.readthedocs.io/
MIT License
9.11k stars 1.13k forks source link

HTTPError: HTTP Error 400: Bad Request on translate call in 0.17.1 #401

Open pdeitel opened 2 years ago

pdeitel commented 2 years ago

Translation still failing in 0.17.1:

from textblob import TextBlob
b = TextBlob("The weather is beautiful today. Tomorrow looks like bad weather.")

b.translate(to='es')

yields

------------------------------------------------------------------------
HTTPError                              Traceback (most recent call last)
<ipython-input-7-90a0f454308a> in <module>
----> 1 b.translate(to="es")

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/site-packages/textblob/blob.py in translate(self, from_lang, to)
    566             DeprecationWarning
    567         )
--> 568         return self.__class__(self.translator.translate(self.raw,
    569                               from_lang=from_lang, to_lang=to))
    570 

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/site-packages/textblob/translate.py in translate(self, source, from_lang, to_lang, host, type_)
     52             client="te",
     53         )
---> 54         response = self._request(url, host=host, type_=type_, data=data)
     55         result = json.loads(response)
     56         if isinstance(result, list):

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/site-packages/textblob/translate.py in _request(self, url, host, type_, data)
     94         if host or type_:
     95             req.set_proxy(host=host, type=type_)
---> 96         resp = request.urlopen(req)
     97         content = resp.read()
     98         return content.decode('utf-8')

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    220     else:
    221         opener = _opener
--> 222     return opener.open(url, data, timeout)
    223 
    224 def install_opener(opener):

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
    529         for processor in self.process_response.get(protocol, []):
    530             meth = getattr(processor, meth_name)
--> 531             response = meth(req, response)
    532 
    533         return response

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/urllib/request.py in http_response(self, request, response)
    638         # request was successfully received, understood, and accepted.
    639         if not (200 <= code < 300):
--> 640             response = self.parent.error(
    641                 'http', request, response, code, msg, hdrs)
    642 

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/urllib/request.py in error(self, proto, *args)
    567         if http_err:
    568             args = (dict, 'default', 'http_error_default') + orig_args
--> 569             return self._call_chain(*args)
    570 
    571 # XXX probably also want an abstract factory that knows when it makes

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
    500         for handler in handlers:
    501             func = getattr(handler, meth_name)
--> 502             result = func(*args)
    503             if result is not None:
    504                 return result

~/anaconda3/envs/py38dsftJuly21/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
    647 class HTTPDefaultErrorHandler(BaseHandler):
    648     def http_error_default(self, req, fp, code, msg, hdrs):
--> 649         raise HTTPError(req.full_url, code, msg, hdrs, fp)
    650 
    651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 400: Bad Request
Zainy1453 commented 2 years ago

Same issue while using .apply()

elgamerjugon commented 2 years ago

I'm having the same problem and as I did some testing is because of the punctuation. You have to split all sentences when you find any period (.)

behai-nguyen commented 2 years ago

I have the same problem from English to Vietnamese. It is becasue of exclamation marks ( ! )

from textblob import TextBlob

text = 'Bữa kia không biết cô Em Nhim nói gì! Nay mới hiểu!'

blob = TextBlob( text )
translated_text = blob.translate( from_lang='vi', to='en' )
Doublefire-Chen commented 2 years ago

I have the same problem from English to Vietnamese. It is becasue of exclamation marks ( ! )

from textblob import TextBlob

text = 'Bữa kia không biết cô Em Nhim nói gì! Nay mới hiểu!'

blob = TextBlob( text )
translated_text = blob.translate( from_lang='vi', to='en' )

I don't think so. I delete all "!", but the problem remains.

behai-nguyen commented 2 years ago

I don't think so. I delete all "!", but the problem remains. translation-01

This is the site: https://behai-translate.herokuapp.com/

I removed the two ! and it works.

Doublefire-Chen commented 2 years ago

I don't think so. I delete all "!", but the problem remains. translation-01

This is the site: https://behai-translate.herokuapp.com/

I removed the two ! and it works.

It seems because I have been rate-limited for calling it too much in succession. reference:https://stackoverflow.com/questions/56189054/textblob-httperror-http-error-429-too-many-requests

behai-nguyen commented 2 years ago

I don't think so. I delete all "!", but the problem remains. translation-01

This is the site: https://behai-translate.herokuapp.com/ I removed the two ! and it works.

It seems because I have been rate-limited for calling it too much in succession. reference:https://stackoverflow.com/questions/56189054/textblob-httperror-http-error-429-too-many-requests

I am pretty new at this. This the Git for this web page:

https://github.com/behai-nguyen/translation

During writing this, I encountered two type of exceptions:

HTTP Error 400: Bad Request
Translation API returned the input string unchanged.

The Vietnamese sentence I quoted results in 400. I found at least a single English sentence which won't translate into Vietnamese.

BartAgterbosch commented 2 years ago

Similar issue here when trying to doing detect_language(), http error 400 bad request, without special characters.

reignerlastimosa09568402070 commented 2 years ago

Similar issue here when trying to doing detect_language(), http error 400 bad request, without special characters. Same problem, any fix?

BartAgterbosch commented 2 years ago

Similar issue here when trying to doing detect_language(), http error 400 bad request, without special characters. Same problem, any fix?

Yea use specific version of googletrans==4.0.0rc1 instead

abdutlgz commented 2 years ago

Did someone solve this issue? I can't even do blob.detect_language().

pdeitel commented 2 years ago

Did someone solve this issue? I can't even do blob.detect_language().

I have had no success with this in TextBlob and it's been an issue for going on a year now. I have since moved my language recognition and translation code to deep translator. And there are lots of other libraries out there.

abdutlgz commented 2 years ago

Alright, got it, thank you Mr. Deitel! I really enjoy your book!

behai-nguyen commented 2 years ago

Thank you Mr. Deitel. I enjoy your book too, and I have learned a lot from it.

derekjreed commented 11 months ago

It's not a good sign when the person who's course you are looking at has raised the call 2 years prior. (@pdeitel). I am getting 'HTTPError: HTTP Error 400: Bad Request' on the command 'exblob.detect_language()'

In [6]: exblob.detect_language()
URL is : http://translate.google.com/translate_a/t?client=webapp&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&dt=at&ie=UTF-8&oe=UTF-8&otf=2&ssel=0&tsel=0&kc=1&sl=auto&tk=198388.341386&client=te
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 exblob.detect_language()

File C:\ProgramData\miniconda3\lib\site-packages\textblob\blob.py:597, in BaseBlob.detect_language(self)
    572 """Detect the blob's language using the Google Translate API.
    573
    574 Requires an internet connection.
   (...)
    590 :rtype: str
    591 """
    592 warnings.warn(
    593     'TextBlob.detext_translate is deprecated and will be removed in a future release. '
    594     'Use the official Google Translate API instead.',
    595     DeprecationWarning
    596 )
--> 597 return self.translator.detect(self.raw)

File C:\ProgramData\miniconda3\lib\site-packages\textblob\translate.py:77, in Translator.detect(self, source, host, type_)
     71 url = u'{url}&sl=auto&tk={tk}&client={client}'.format(
     72     url=self.url,
     73     tk=_calculate_tk(source),
     74     client="te",
     75 )
     76 print(f'URL is : {url}')
---> 77 response = self._request(url, host=host, type_=type_, data=data)
     78 result, language = json.loads(response)
     79 return language

File C:\ProgramData\miniconda3\lib\site-packages\textblob\translate.py:97, in Translator._request(self, url, host, type_, data)
     95 if host or type_:
     96     req.set_proxy(host=host, type=type_)
---> 97 resp = request.urlopen(req)
     98 content = resp.read()
     99 return content.decode('utf-8')

File C:\ProgramData\miniconda3\lib\urllib\request.py:216, in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    214 else:
    215     opener = _opener
--> 216 return opener.open(url, data, timeout)

File C:\ProgramData\miniconda3\lib\urllib\request.py:525, in OpenerDirector.open(self, fullurl, data, timeout)
    523 for processor in self.process_response.get(protocol, []):
    524     meth = getattr(processor, meth_name)
--> 525     response = meth(req, response)
    527 return response

File C:\ProgramData\miniconda3\lib\urllib\request.py:634, in HTTPErrorProcessor.http_response(self, request, response)
    631 # According to RFC 2616, "2xx" code indicates that the client's
    632 # request was successfully received, understood, and accepted.
    633 if not (200 <= code < 300):
--> 634     response = self.parent.error(
    635         'http', request, response, code, msg, hdrs)
    637 return response

File C:\ProgramData\miniconda3\lib\urllib\request.py:563, in OpenerDirector.error(self, proto, *args)
    561 if http_err:
    562     args = (dict, 'default', 'http_error_default') + orig_args
--> 563     return self._call_chain(*args)

File C:\ProgramData\miniconda3\lib\urllib\request.py:496, in OpenerDirector._call_chain(self, chain, kind, meth_name, *args)
    494 for handler in handlers:
    495     func = getattr(handler, meth_name)
--> 496     result = func(*args)
    497     if result is not None:
    498         return result

File C:\ProgramData\miniconda3\lib\urllib\request.py:643, in HTTPDefaultErrorHandler.http_error_default(self, req, fp, code, msg, hdrs)
    642 def http_error_default(self, req, fp, code, msg, hdrs):
--> 643     raise HTTPError(req.full_url, code, msg, hdrs, fp)

HTTPError: HTTP Error 400: Bad Request

I've printed out the URL generated in the site-packages\textblob\translate.py (def detect(self, source, host=None, type_=None): line64). When you curl the URL you get: ...

$ *   Trying 172.217.16.238:80...
* TCP_NODELAY set
* Connected to translate.google.com (172.217.16.238) port 80 (#0)
> GET /translate_a/t?client=webapp HTTP/1.1
> Host: translate.google.com
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Content-Type: text/html; charset=utf-8
< Cache-Control: no-cache, no-store, max-age=0, must-revalidate
< Pragma: no-cache
< Expires: Mon, 01 Jan 1990 00:00:00 GMT
< Date: Wed, 25 Oct 2023 19:15:09 GMT
< Cross-Origin-Opener-Policy: same-origin
< Content-Security-Policy: require-trusted-types-for 'script';report-uri /_/TranslateApiHttp/cspreport
< Content-Security-Policy: script-src 'nonce-ky73sNLR8csY652zGBbnPw' 'unsafe-inline';object-src 'none';base-uri 'self';report-uri /_/TranslateApiHttp/cspreport;worker-src 'self'
< Permissions-Policy: ch-ua-arch=*, ch-ua-bitness=*, ch-ua-full-version=*, ch-ua-full-version-list=*, ch-ua-model=*, ch-ua-wow64=*, ch-ua-form-factor=*, ch-ua-platform=*, ch-ua-platform-version=*
< Accept-CH: Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Model, Sec-CH-UA-WoW64, Sec-CH-UA-Form-Factor, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version
< Cross-Origin-Resource-Policy: cross-origin
< Server: ESF
< X-XSS-Protection: 0
< X-Content-Type-Options: nosniff
< Accept-Ranges: none
< Vary: Accept-Encoding
< Transfer-Encoding: chunked
<
<html lang=en><meta charset=utf-8><meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"><title>Error 400 (Bad Request)!!1</title><style nonce="Igef5SKGfvFHBNV2iMyg-w">*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{color:#222;text-align:unset;margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px;}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}pre{white-space:pre-wrap;}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 10* Connection #0 to host translate.google.com left intact
0%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}</style><main id="af-error-container" role="main"><a href=//www.google.com><span id=logo aria-label=Google role=img></span></a><p><b>400.</b> <ins>That’s an error.</ins><p>The server cannot process the request because it is malformed. It should not be retried. <ins>That’s all we know.</ins></main>
derekjreed commented 11 months ago

So it looks like the general 'free' translate API which Google gives is not stable and is moved around a fair bit (https://github.com/ssut/py-googletrans/issues/268) and that the only stable API for translation would involve you getting an account on google cloud, setting up a project and generating an API key (the URL would then be specific to your google cloud account). I'm not sure how helpful this is, but it looks like this code would need to be refactored to take this into account.