psf / requests

A simple, yet elegant, HTTP library.
https://requests.readthedocs.io/en/latest/
Apache License 2.0
51.91k stars 9.29k forks source link

HTTPDigestAuth fails on non-latin credentials #6102

Open ondratu opened 2 years ago

ondratu commented 2 years ago

There was issue reported, which is closed with bad results.

https://github.com/psf/requests/blob/4f6c0187150af09d085c03096504934eb91c7a9e/requests/auth.py#L59-L63

Don't pass unicode strings in the arguments, but use UTF8 bytes instead.

self.session.get(main_url, auth=requests.auth.HTTPDigestAuth("Сергей_Ласточкин".encode('UTF-8'), '1234'))

Originally posted by @D-stefaang in https://github.com/psf/requests/issues/5089#issuecomment-763569911

But this is wrong! When i try to set user 'Ondřej' with this advice, requests send bad string:

HTTPDigestAuth('Ondřej'.encode('utf-8'), 'heslíčko')

creates header starts with wrong username!

Digest username="b'Ond\xc5\x99ej'"
nateprewitt commented 2 years ago

Hi @ondratu,

Could you please clarify what you believe is wrong in this case? ř is the byte-sequence \xc5\x99 in UTF-8, so we'd expect the bytes object to be Ond\xc5\x99ej. We can quickly verify this by checking:

'Ondřej'.encode('utf-8') == b'Ond\xc5\x99ej'
>>> True

It's not clear what other value you'd be expecting.

nateprewitt commented 2 years ago

Hmm, on closer inspection this does appear to be a bug. We're using the bytes username as an argument to format our string for the header. This causes the full literal "b'Ond\xc5\x99ej'" to be used which I agree doesn't look correct. This header should be encoded as bytes during creation but we currently defer that to be urllib3's problem.

When not encoding the auth/password, we get this:

Traceback (most recent call last):
  File "/Users/nateprewitt/Work/OpenSource/requests/test.py", line 3, in <module>
    r = requests.get('https://httpbin.org/digest-auth/auth/Ondřej/heslíčko', auth=h)
  File "/Users/nateprewitt/Work/OpenSource/requests/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  [...]
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1323, in _send_request
    self.putheader(hdr, value)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/site-packages/urllib3/connection.py", line 224, in putheader
    _HTTPConnection.putheader(self, header, *values)
  File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1255, in putheader
    values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0159' in position 20: ordinal not in range(256)

Fixing this is unfortunately somewhat complicated for a couple reasons:

1.) Users expect the output of HTTPDigestAuth to be a str in Python 3, changing that is likely breaking. 2.) If we were to encode the string, the libraries current convention would be latin-1 not utf-8. That wouldn't solve this issue.

I don't believe we can ever format this correctly in Python 3 with the current behavior though. I'll need to look more tomorrow, but we may consider a behavior change if self.username/self.password is passed in as bytes.

mubaldino commented 4 months ago

Hi all, we encountered this week as we had a password policy change, too. Unicode is now optional requirement for passwords. But some may use it (I did...) and it broke twine, pip, and other things that would use requests.auth module for authentication.

I saw the exception above, and knew immediately a quick test to see if charset encoding would fix it:

    # requests/auth.py,   _basic_auth_str()
....
    if isinstance(password, str):
        # password = password.encode("latin1")
        password = password.encode("utf-8")            # Hey! IT works,... now just make it an optional kwarg

I realize exposing encoding in the function signatures will have a ripple effect, but I don't think the latin1 assumption is all that great either.

Requested change: add encoding kwarg to _basic_auth_str() and its consumers

def _basic_auth_str(username, passwd, encoding="latin1"):
   ...
   password = password.encode(encoding)
   ..

One line fix and I'm good to go.... hopefully you can include in an upcoming release. Password policy changes are now more frequently including Unicode.