Closed hobbyhack closed 8 years ago
Actually after I posted this I realized that this code breaks this going from english to Arabic. So maybe the best thing to do is to use that code in a completely separate function.
you should fork me and make a pull request :)
Wow. I don't understand the lingo but this sounds like a great offer. I will see if I can figure out how to do this. However, if anyone else wants to take this and use it please do.
Shane
On Tue, Sep 6, 2016 at 11:36 AM, Arnaud Aliès notifications@github.com wrote:
you should fork me and make a pull request :)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mouuff/Google-Translate-API/issues/5#issuecomment-244885753, or mute the thread https://github.com/notifications/unsubscribe-auth/AAf3feoJTvkzLbq_UQmkSPA1gOhWqjcLks5qnSYBgaJpZM4Jzcdy .
I actually don't have this problem on both python2 and python3 version: Hola como estas? >> Hello how are you? Hola como estas? >> Привет, как ты? Hola como estas? >> مرحبا كيف حالك؟ identity >> identité
hobbyhack, github allows you to copy my code "fork", and edit it, once you done that you can put your changes here with a "pull request"
The problem is going from unicode. It is related to the way python3 handles web URLs. However, this would on the Google side if Python3 would handle it.
Here is some code that would allow you to reproduce it (if the problem is not my terminal):
print(Utilities.translate('system', "ar", "en"))
print(Utilities.translate('نظام', "ar", "en"))
On my machine, going from English to Arabic works fine. However, going from Arabic to English errors. Here is result:
نظام
Traceback (most recent call last):
File "/Users/shanegary/Library/Mobile Documents/com~apple~CloudDocs/Data/AppDev/fadal/Main.py", line 23, in <module>
print(Utilities.translate('نظام', "ar", "en"))
File "/Users/shanegary/Library/Mobile Documents/com~apple~CloudDocs/Data/AppDev/fadal/Utilities.py", line 57, in translate
page = urllib.request.urlopen(request).read().decode("utf-8")
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in url open
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 466, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 484, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1282, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1141, in _send_request
self.putrequest(method, url, **skips)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 983, in put request
self._output(request.encode('ascii'))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-24: ordinal not in range(128)
Process finished with exit code 1
I used the same word to test in both directions and the English to Arabic worked.
I have fixed this in my code by adding the functions. I am not sure if this would work for other languages. However, I tested with with a bunch of Arabic words and it worked great:
def convertUnicodeToHTMLEsc(text):
htmlEsc = str(text.encode()).replace("b\'\\x", "%").replace("\\x", "%").replace("\'", '')
return htmlEsc
def translateFromUnicode(to_translate, to_language="auto", language="auto"):
htmlEsc = convertUnicodeToHTMLEsc(to_translate)
translation = translate(htmlEsc, to_language, language)
return (translation)
I just call translateFromUnicode() when I am translating from Arabic and call your function directly when I am translating from English. I should have some time next week to fork your code and post these new functions.
It looks to me like there is a bug open with Python. However, they seem to have a good reason not to fix it (URLs are supposed to be ascii according to the standard Python is quoting). Python issue # 3991
try adding "&ie=UTF-8" in link
like this: link = "http://translate.google.com/m?hl=%s&sl=%s&q=%s&ie=UTF-8" % (to_langage, langage, to_translate.replace(" ", "+"))
(btw I should recode this part using url encoder and regex ...)
I tried adding &ie=UTF-8 and it still doesn't work on my machine. Just like the original code, if I output the url being tried and paste it into a browser it gives me the same page as when I convert to html escape code. So I think the code should work as is.
It seems like python3 is just refusing to try the URL which would actually work. The python2 code is working fine. It might just be my machine. However, I am pretty sure this is python issue #3991.
It might be worth waiting on Python dev to add the "enhancement" back to python3 instead of accepting my pull request. My addition to the code makes things more complicated because you would use a different function based on if someone has the issue or not. And even then, you would translate one direction with your original function and the other with mine.
I haven't coded in a decade and even a decade ago I never wrote much more than automation of daily tasks. I have shared some code but have never tried sharing how I modified other peoples code. I wasn't sure how much I should be actually updating the function you wrote versus just providing a function others could use if they had the same issue.
I pushed a new version, this should work tell me if you still have the issue
It works great, thanks! This is much cleaner than my "fix".
The python3 version wasn't working for me with Arabic input language. It seems like it might be broken with all non ascii letter systems. However there is a way to first convert the text to an HTML escape code then send it to you function with the following code:
def convertUnicodeToHTMLEsc(text): htmlEsc = str(text.encode()).replace("b\'\x", "%").replace("\x", "%").replace("\'", '') return htmlEsc
then after the comment in your function put this line: to_translate = convertUnicodeToHTMLEsc(to_translate)
I am more of a code hacker than a coder so there might be a easier way but this works.