hradyesh / recaptcha

Automatically exported from code.google.com/p/recaptcha
0 stars 0 forks source link

Bug in reCAPTCHA Python library + patch #7

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Enter a character beyond ASCII range(128) using Python library
2. submit
3. see error

What is the expected output? What do you see instead?

I expect it to fail and set an error code that signals a byte with a value
greater than \x7f is not possible to encode.

What version of the product are you using? On what operating system?

Ubuntu Linux 7.10, Python 2.5 (this will happen everywhere I think)

I've added added a patch ran from the root of the working copy to fix this
error. Let me know if this is how things should be handled.

Original issue reported on code.google.com by ryankas...@gmail.com on 24 Dec 2007 at 7:28

Attachments:

GoogleCodeExporter commented 9 years ago
Fixes encoding bug. 

What will happen:

response = páginas de España 
....response=p%C3%A1ginas+de+Espa%C3%B1a....  -> What is sent to server.

Just add doseq=1 to the urllib.urlencode call. I've submitted a trivial patch 
anyway. 

Original comment by ryankas...@gmail.com on 24 Dec 2007 at 10:58

Attachments:

GoogleCodeExporter commented 9 years ago
this assumes the server can make sense of p%C3%A1ginas+de+Espa%C3%B1a (unicode
numbers in hex)

Original comment by ryankas...@gmail.com on 24 Dec 2007 at 11:06

GoogleCodeExporter commented 9 years ago
related thread at
http://groups.google.com/group/recaptcha/browse_thread/thread/1729f7edc1de0d4b/5
cb1894af65fe327

Original comment by jabron...@gmail.com on 30 Oct 2008 at 3:15

GoogleCodeExporter commented 9 years ago
relevant python bug: http://bugs.python.org/issue1349732

Original comment by jabron...@gmail.com on 30 Oct 2008 at 6:13

GoogleCodeExporter commented 9 years ago
"doseq=1" is a red herring. Observe:

>>> urllib.urlencode({'response': 'páginas de España'})
'response=p%C3%A1ginas+de+Espa%C3%B1a'
>>> # this looks fine

>>> urllib.urlencode({'response': 'páginas de España'}, doseq=1)
'response=p%C3%A1ginas+de+Espa%C3%B1a'
>>> # doseq=1 produces the same (correct) result, so far there's no reason to 
use it

>>> # note that we've been passing a string...
>>> # if instead we pass a unicode, characters will be converted to '?' (%3F)
>>> # since urlencode calls s.encode('ASCII', 'replace'):
>>> urllib.urlencode({'response': u'páginas de España'}, doseq=1)
'response=p%3Fginas+de+Espa%3Fa'
>>> # this is not what we want!

>>> # here the "doseq=1" is having a mitigating effect (albeit not the one we 
want),
>>> # since we're passing a unicode. Without it we get an error:
>>> urllib.urlencode({'response': u'páginas de España'})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File
"/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/urllib.p
y",
line 1250, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 1:
ordinal not in range(128)

From your original patch, it's clear you're passing a unicode, as you were 
catching
the UnicodeEncodeError generated by something like the above. In which case, 
doseq=1
will result in characters being converted to '?' as shown, which is not what we 
want.

Original comment by jabron...@gmail.com on 30 Oct 2008 at 5:32

GoogleCodeExporter commented 9 years ago
fixed r106

Original comment by jabron...@gmail.com on 30 Oct 2008 at 6:26

GoogleCodeExporter commented 9 years ago

Original comment by adrian.g...@gmail.com on 30 Mar 2012 at 6:19