Basic http auth + unicode = error

sztomi commented 8 years ago

Description

It is not possible to send a basic http authentication using a username or password that contains Unicode data.

What happens

UnicodeEncodeError is thrown. Traceback:

  File "(my code)", line 163, in _get
    auth=(self.user, self.password))
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 67, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 454, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python3.5/site-packages/requests/sessions.py", line 388, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python3.5/site-packages/requests/models.py", line 297, in prepare
    self.prepare_auth(auth, url)
  File "/usr/local/lib/python3.5/site-packages/requests/models.py", line 490, in prepare_auth
    r = auth(self)
  File "/usr/local/lib/python3.5/site-packages/requests/auth.py", line 51, in __call__
    r.headers['Authorization'] = _basic_auth_str(self.username, self.password)
  File "/usr/local/lib/python3.5/site-packages/requests/auth.py", line 31, in _basic_auth_str
    b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0171' in position 7: ordinal not in range(256)

Expected behavior

The authentication is encoded as utf-8 (at least if charset=utf-8 is provided in the header).

How to reproduce

Consider the following request:

        user = 'űser'
        password = 'páéáésswőrd'
        response = requests.get(
            url='http://example.com'
            headers={'Content-Type': 'application/xml; charset=utf-8'},
            auth=(user, password))

I think that the culprit is this line (https://github.com/kennethreitz/requests/blob/master/requests/auth.py#L32), which assumes latin-1 encoding regardless of the charset header:

def _basic_auth_str(username, password):
    """Returns a Basic Auth string."""

    authstr = 'Basic ' + to_native_string(
        b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
    )

    return authstr

Workaround

This seems to work:

    from requests.auth import to_native_string

    # ... snip ...

        auth = 'Basic ' + to_native_string(b64encode('{}:{}'.format(self.user, self.password).encode('utf-8')).strip())
        response = requests.get(
            url=self.base_url + uri,
            headers={
                'Content-Type': 'application/xml; charset=utf-8',
                'Authorization': auth
            })

Version info

$ pip show requests

---
Metadata-Version: 2.0
Name: requests
Version: 2.9.1
Summary: Python HTTP for Humans.
Home-page: http://python-requests.org
Author: Kenneth Reitz
Author-email: me@kennethreitz.com
Installer: pip
License: Apache 2.0
Location: /usr/local/lib/python3.5/site-packages
Requires: 
Classifiers:
  Development Status :: 5 - Production/Stable
  Intended Audience :: Developers
  Natural Language :: English
  License :: OSI Approved :: Apache Software License
  Programming Language :: Python
  Programming Language :: Python :: 2.7
  Programming Language :: Python :: 3
  Programming Language :: Python :: 3.3
  Programming Language :: Python :: 3.4
  Programming Language :: Python :: 3.5

Lukasa commented 8 years ago

Yeah, that looks like a bug.

I think in this case the best fix is to allow the user to provide bytestrings for the username and password, and if they do that to simply use the bytestring directly rather than to try to encode.

Are you interested in providing a test and patch for this?

sztomi commented 8 years ago

@Lukasa Gladly, but I'm a bit overburdened at the moment. I'll have spare time in 2-3 weeks, if it's still open, I'll take a peek.

Lukasa commented 8 years ago

Ok cool, I'll mark this as contributor friendly and if no-one else picks it up by the time you have time you should take a swing at it.

ghost commented 8 years ago

Hello, @lukasa!

Your idea about byte strings looks very good and fully matches the white spaces in spec.

But.

There are two ways to release your idea: 1) Save user/pass in bytes. Looks not good, because in fact we always need to check type of variable, before use it. 2) Convert user/pass to strings in init(). Looks not good, because we lose the original values.

And last, I think 95% peoples will be write code like this:

u = 'Дмитрий' # my name in Russian
p = 'password'
r = request.get(url, auth=(u.encode('utf-8'), p))

To my mind, it looks not 'for humans'. Without this patch we can write this for same result:

r = request.get(url, auth=(u.encode('utf-8').decode('latin1'), p))

But we can change only one line of code:

- b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
+ b64encode(('%s:%s' % (username, password)).encode('utf-8')).strip()

After that the same code will look as:

r = request.get(url, auth=(u, p))

It looks for Humans :)

What do you think about all this?

Sorry for my grammar.

Lukasa commented 8 years ago

@klimenko It does look better that way, but it's unfortunately just moving the problem. Now anyone whose server is expecting a non-UTF-8 encoded username is going to get tripped up, and so we'll have to re-open this issue when someone says "my server wanted Latin1 and now doesn't get it".

It's better to use bytestrings because that way we avoid making a guess that is wrong. If the users still want the helpful automatic choice, they can pass a unicode string, but if they want to do something more specific we have an escape hatch for them.

rmhasan commented 8 years ago

Hi guys, I would like to take a crack at this.

nateprewitt commented 8 years ago

@rmhasan thanks for the interest in contributing! It may be important to note that PR #3673 is already open to address this. You may want to keep an eye on the outcome of that before spending time working on a solution.

rmhasan commented 8 years ago

@nateprewitt I will keep an eye on it, thanks.

nateprewitt commented 8 years ago

Resolved by #3673.

papparotzi commented 6 years ago

Thanks, I got past it!

psf / requests