Open ondratu opened 2 years ago
Hi @ondratu,
Could you please clarify what you believe is wrong in this case? ř
is the byte-sequence \xc5\x99
in UTF-8, so we'd expect the bytes object to be Ond\xc5\x99ej
. We can quickly verify this by checking:
'Ondřej'.encode('utf-8') == b'Ond\xc5\x99ej'
>>> True
It's not clear what other value you'd be expecting.
Hmm, on closer inspection this does appear to be a bug. We're using the bytes username as an argument to format our string for the header. This causes the full literal "b'Ond\xc5\x99ej'"
to be used which I agree doesn't look correct. This header should be encoded as bytes during creation but we currently defer that to be urllib3's problem.
When not encoding the auth/password, we get this:
Traceback (most recent call last):
File "/Users/nateprewitt/Work/OpenSource/requests/test.py", line 3, in <module>
r = requests.get('https://httpbin.org/digest-auth/auth/Ondřej/heslíčko', auth=h)
File "/Users/nateprewitt/Work/OpenSource/requests/requests/api.py", line 73, in get
return request("get", url, params=params, **kwargs)
[...]
File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1323, in _send_request
self.putheader(hdr, value)
File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/site-packages/urllib3/connection.py", line 224, in putheader
_HTTPConnection.putheader(self, header, *values)
File "/Users/nateprewitt/.pyenv/versions/3.10.3/lib/python3.10/http/client.py", line 1255, in putheader
values[i] = one_value.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode character '\u0159' in position 20: ordinal not in range(256)
Fixing this is unfortunately somewhat complicated for a couple reasons:
1.) Users expect the output of HTTPDigestAuth
to be a str
in Python 3, changing that is likely breaking.
2.) If we were to encode the string, the libraries current convention would be latin-1
not utf-8
. That wouldn't solve this issue.
I don't believe we can ever format this correctly in Python 3 with the current behavior though. I'll need to look more tomorrow, but we may consider a behavior change if self.username
/self.password
is passed in as bytes.
Hi all,
we encountered this week as we had a password policy change, too. Unicode is now optional requirement for passwords. But some may use it (I did...) and it broke twine, pip, and other things that would use requests.auth
module for authentication.
I saw the exception above, and knew immediately a quick test to see if charset encoding would fix it:
# requests/auth.py, _basic_auth_str()
....
if isinstance(password, str):
# password = password.encode("latin1")
password = password.encode("utf-8") # Hey! IT works,... now just make it an optional kwarg
I realize exposing encoding in the function signatures will have a ripple effect, but I don't think the latin1
assumption is all that great either.
Requested change: add encoding kwarg to _basic_auth_str()
and its consumers
def _basic_auth_str(username, passwd, encoding="latin1"):
...
password = password.encode(encoding)
..
One line fix and I'm good to go.... hopefully you can include in an upcoming release. Password policy changes are now more frequently including Unicode.
There was issue reported, which is closed with bad results.
https://github.com/psf/requests/blob/4f6c0187150af09d085c03096504934eb91c7a9e/requests/auth.py#L59-L63
Don't pass unicode strings in the arguments, but use UTF8 bytes instead.
Originally posted by @D-stefaang in https://github.com/psf/requests/issues/5089#issuecomment-763569911
But this is wrong! When i try to set user 'Ondřej' with this advice, requests send bad string:
creates header starts with wrong username!