UnicodeEncodeError in urlencode (called by oauth2) when calling Api.PostUpdate() with a unicode string

GoogleCodeExporter commented 8 years ago

What steps will reproduce the problem?
1. tweet = u'Test: \u2605'
2. api.PostUpdate(tweet)

What is the expected output? What do you see instead?

The tweet should be posted.  Instead, I see:

  File "/home/dspitzer/Netflix/hackday/2010/streamingbot/twitter.py", line 2157, in PostUpdate
    json = self._FetchUrl(url, post_data=data)
  File "/home/dspitzer/Netflix/hackday/2010/streamingbot/twitter.py", line 3011, in _FetchUrl
    req.sign_request(self._signature_method_hmac_sha1, self._oauth_consumer, self._oauth_token)
  File "/usr/lib/pymodules/python2.6/oauth2/__init__.py", line 381, in sign_request
    self['oauth_signature'] = signature_method.sign(self, consumer, token)
  File "/usr/lib/pymodules/python2.6/oauth2/__init__.py", line 704, in sign
    key, raw = self.signing_base(request, consumer, token)
  File "/usr/lib/pymodules/python2.6/oauth2/__init__.py", line 693, in signing_base
    escape(request.get_normalized_parameters()),
  File "/usr/lib/pymodules/python2.6/oauth2/__init__.py", line 364, in get_normalized_parameters
    encoded_str = urllib.urlencode(sorted(items))
  File "/usr/lib/python2.6/urllib.py", line 1267, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2605' in position 
43: ordinal not in range(128)

What version of the product are you using? On what operating system?

python-twitter __version__ = '0.8-devel'

Please provide any additional information below.

I thought the problem may have been caused by line 2153:

    data = {'status': status}

I changed it to:

    data = {'status': u_status}

But that didn't solve the problem.

That may still be a bug, but I can see why it doesn't solve my problem, since 
I'm already passing in a unicode string.

Original issue reported on code.google.com by daryl.sp...@gmail.com on 1 Sep 2010 at 3:59

GoogleCodeExporter commented 8 years ago

Please consider raising the priority of this.

Original comment by daryl.sp...@gmail.com on 1 Sep 2010 at 4:02

GoogleCodeExporter commented 8 years ago

are you running the very latest code from the repo?  I know I tested unicode 
recently because I was working on another bug that is unicode related.

but yes, if you are using the default tip I will raise the priority and also 
make time tonight to find out why

Original comment by bear42 on 1 Sep 2010 at 4:04

GoogleCodeExporter commented 8 years ago

Yes, I believe I am using the very latest code.  I don't have a lot of 
experience with Mercurial (so I might be doing something wrong), but I just 
tried to pull any new changes and got:

$ hg pull
pulling from https://python-twitter.googlecode.com/hg/
searching for changes
no changes found

Original comment by daryl.sp...@gmail.com on 1 Sep 2010 at 4:11

GoogleCodeExporter commented 8 years ago

hrmm, ok

i'll take a look at this tonight when I get home from work.

Original comment by bear42 on 1 Sep 2010 at 4:14

Changed state: Accepted
Added labels: Priority-Critical
Removed labels: Priority-Medium

GoogleCodeExporter commented 8 years ago

Original comment by bear42 on 1 Sep 2010 at 4:14

GoogleCodeExporter commented 8 years ago

Maybe http://www.gossamer-threads.com/lists/python/dev/645002 helps a bit.

A workaround for me was to do
>>> status = api.PostUpdate(text.encode('utf8'))

Original comment by christop...@gmail.com on 4 Sep 2010 at 10:20

GoogleCodeExporter commented 8 years ago

Seeing this in work in progress on getting the tests to run... will use 
text.encode for now (in tests)

Original comment by colinthe...@googlemail.com on 7 Sep 2010 at 5:17

GoogleCodeExporter commented 8 years ago

Hello, I have the same problem. Yes, one can use .encode('utf8') and this will 
work - unless the string becomes longer than 140 characters by encoding it 
(yes, the len() of the text increases by encoding it, if it contains non-ascii 
characters).

This means that tweets, altough <140 characters long, cannot be sent with 
api.PostUpdate().

I am not sure that this is really python-twitters fault, but would like to hear 
how to do this properly.

best regards
/thomas

Original comment by thomas.m...@gmail.com on 6 Oct 2010 at 1:41

GoogleCodeExporter commented 8 years ago

The API allows you to specify an input encoding. By default this is set to None 
if you pass nothing in. This will cause python-twitter to process the string 
directly without doing any encoding/decoding.

The workaround people have been using of doing 
api.PostUpdate(text.encode('utf8')) will cause python-twitter to do 
len(text.encode('utf8')) which won't be the actual length of the string due to 
how python handles character encoding.

I believe the correct thing to do here is pass a character encoding in when 
creating the Api instance and then pass the raw strings in. E.g.

api = Api(..., input_encoding='utf8')
api.PostUpdate('ب')

Will work :)

Original comment by colinthe...@googlemail.com on 15 Nov 2010 at 10:13

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

Note: you'll still have to call .encode on any utf8 text passed in. Seems that 
urllib is expecting ascii... erk.

Original comment by colinthe...@googlemail.com on 15 Nov 2010 at 12:30

GoogleCodeExporter commented 8 years ago

Proposed patch.

Original comment by edward.h...@gmail.com on 4 Feb 2011 at 4:43

Attachments:

0003-UTF-8-fix.patch

wangfanUESTC / python-twitter

UnicodeEncodeError in urlencode (called by oauth2) when calling Api.PostUpdate() with a unicode string #156