sloria / TextBlob

Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
https://textblob.readthedocs.io/
MIT License
9.15k stars 1.15k forks source link

HTTP Error 503: Service Unavailable while using detect_language() and translate() from textblob #215

Closed craigchen1990 closed 4 years ago

craigchen1990 commented 6 years ago

python:3.5 textblob:0.15.1

seems it happened before and fixed in #148

the detail logs File "/usr/local/lib/python3.5/site-packages/textblob/blob.py", line 562, in detect_language return self.translator.detect(self.raw) File "/usr/local/lib/python3.5/site-packages/textblob/translate.py", line 72, in detect response = self._request(url, host=host, type_=type_, data=data) File "/usr/local/lib/python3.5/site-packages/textblob/translate.py", line 92, in _request resp = request.urlopen(req) File "/usr/local/lib/python3.5/urllib/request.py", line 163, in urlopen return opener.open(url, data, timeout) File "/usr/local/lib/python3.5/urllib/request.py", line 472, in open response = meth(req, response) File "/usr/local/lib/python3.5/urllib/request.py", line 582, in http_response 'http', request, response, code, msg, hdrs) File "/usr/local/lib/python3.5/urllib/request.py", line 504, in error result = self._call_chain(*args) File "/usr/local/lib/python3.5/urllib/request.py", line 444, in _call_chain result = func(*args) File "/usr/local/lib/python3.5/urllib/request.py", line 696, in http_error_302 return self.parent.open(new, timeout=req.timeout) File "/usr/local/lib/python3.5/urllib/request.py", line 472, in open response = meth(req, response) File "/usr/local/lib/python3.5/urllib/request.py", line 582, in http_response 'http', request, response, code, msg, hdrs) File "/usr/local/lib/python3.5/urllib/request.py", line 510, in error return self._call_chain(*args) File "/usr/local/lib/python3.5/urllib/request.py", line 444, in _call_chain result = func(*args) File "/usr/local/lib/python3.5/urllib/request.py", line 590, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp)

craigchen1990 commented 6 years ago

and i make some tests and look like sl=auto&tl=en&hl=en&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&ie=UTF-8&oe=UTF-8&otf=1&ssel=0&tsel=0&kc=4 works in browser

iharshulhan commented 6 years ago

I have the same issue, it appears that google blocks a request and asks to solve captcha

Jor-G-ete commented 6 years ago

I have the same problem with my code. I belive this error migth be happeing due to Google isn't free anymore. So they let you to compute some tanslations untill they reject you from the server.

I changed my ip, and it let me try more words , but when It reached the maximun(418 words translated), it denied the connection again. So that's my guess

Some people in others issue has said the same as me.

RajeshkannanRamakrishnan commented 6 years ago

yes am also facing same issue

daviligade commented 6 years ago

I had exactly the same issue. The same kind of error.

While I was doing a single requests, I had no problems. Then, I put the same code into a big loop, and after some loops I faced the issue. It could be a coincidence, or maybe it is linked with the handling of multiple requests.

sloria commented 6 years ago

Please do not post "+1" or "me too" comments, as they are not constructive.

I would welcome a PR addressing this issue, or even just an analysis of the problem with a possible solution.

EduardoSebastianRodriguez commented 6 years ago

As I have read in some forums, there is a problem in the generation of the virtual identity for Google. In general, in all the functions of the library in which the Google tools are used, it is needed an ID to contact Google and ask for their services. Then, Google analyzes it and decides if the message comes from a robot or from a real user. In order to avoid that, Textblob uses an algorithm to generate de identity. However, Google seems to have updated its protocols, so now the algorithm does not work anymore. The solution is to update the library, using the default library REQUEST of Python, to generate the IDs.

This is the solution I can propose, I cannot confirm its funcionality, but i think is more than nothing :) I hope this post can help others and the responables of the library solving these issues, because in other case the library is completely unuseful.

sergeiGKS commented 6 years ago

Am facing the same issue.

Any update?

ahadmushir commented 6 years ago

Getting the same 503 service error. The library was working fine a week ago.

sloria commented 6 years ago

Again, please do not post any more "+1" comments. I encourage anyone to send a PR resolving this.

Fernandohf commented 6 years ago

A temporary workaround for detect_language could be using the port langdetect. Could run the code bellow without the HTTP Error 503.

from langdetect import detect
for i in range(1000):
    print(detect('Hello World'))

Keep in mind that this doesn't solve the error in translate method.

EDIT: Typo

KishoreKonakanti commented 6 years ago

@Fernandohf , langdetect's ability of identifying the languages seems to be very poor:

langdetect.detect('bonjour') # Supposed to be French but ended up Croatian Out[97]: 'hr'

langdetect.detect('hi') # English /Swedish Out[98]: 'sw'

langdetect.detect('hello') #English/ Finnish Out[99]: 'fi'_ PS: Language codes interpreted as in ISO 639-2

Thank you.

KishoreKonakanti commented 6 years ago

An offline and a decent language detector which worked for my case: langid

Sample code: _import langid as lid lang,conf = lid.classify(word) if conf > threshold: print('%s word belongs to %s language'%(word, lang))

PS: If you are trying to classify only English words 👍 this helps a lot but not much of help for other languages 👎

Thank you.

alihashaam commented 5 years ago

@sloria is this issue resolved and fixed in v0.15.3?

sloria commented 5 years ago

No, this is still an issue.

amritsh commented 5 years ago

The TextBlob translate works on mac, but fails on linux with just the first request throwing 503 error, tried specifying linux environment in the user agent of request, still 503, probably I am missing something

Ashish-Bansal commented 5 years ago

Since it's open issue, I'm adding to it.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mrphantom/leetcode/venv/lib/python3.6/site-packages/textblob/blob.py", line 568, in detect_language
    return self.translator.detect(self.raw)
  File "/home/mrphantom/leetcode/venv/lib/python3.6/site-packages/textblob/translate.py", line 72, in detect
    response = self._request(url, host=host, type_=type_, data=data)
  File "/home/mrphantom/leetcode/venv/lib/python3.6/site-packages/textblob/translate.py", line 92, in _request
    resp = request.urlopen(req)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 564, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 756, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests

According to the comments I read in the thread (works fine on few requests and then 503 etc etc.), it seems like earlier Google was sending wrong response code. I'm not getting 429 instead of 503 which means my requests are being blocked intentionally by Google. 429 is normally used by rate limiters to notify that client should slow down. Right now it's happening for me even on first request from my server's static IP which may mean they are blocking it due to the "human" check.

Anyway, I think the implementation of the detect_language is wrong. If it really depends on the Google Translate, then it should be using the Google API token specified in the https://cloud.google.com/translate/docs/translating-text instead of calling the API endpoint meant for their frontend code.

RuikunLi commented 5 years ago

Since it's open issue, I'm adding to it.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mrphantom/leetcode/venv/lib/python3.6/site-packages/textblob/blob.py", line 568, in detect_language
    return self.translator.detect(self.raw)
  File "/home/mrphantom/leetcode/venv/lib/python3.6/site-packages/textblob/translate.py", line 72, in detect
    response = self._request(url, host=host, type_=type_, data=data)
  File "/home/mrphantom/leetcode/venv/lib/python3.6/site-packages/textblob/translate.py", line 92, in _request
    resp = request.urlopen(req)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 564, in error
    result = self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 756, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests

According to the comments I read in the thread (works fine on few requests and then 503 etc etc.), it seems like earlier Google was sending wrong response code. I'm not getting 429 instead of 503 which means my requests are being blocked intentionally by Google. 429 is normally used by rate limiters to notify that client should slow down. Right now it's happening for me even on first request from my server's static IP which may mean they are blocking it due to the "human" check.

Anyway, I think the implementation of the detect_language is wrong. If it really depends on the Google Translate, then it should be using the Google API token specified in the https://cloud.google.com/translate/docs/translating-text instead of calling the API endpoint meant for their frontend code.

I have the same issue!

sloria commented 5 years ago

I think it's time we remove the language detection and translate features. They use an undocumented/unsupported Google API, which is why they no longer work.

bharaniid commented 4 years ago

Is there any updates. Still i am facing same issue.

sloria commented 4 years ago

For anyone finding this issue, I recommend taking a look at https://github.com/ssut/py-googletrans or official Google Translate API as a substitute for TextBlob's translation functionality