Closed sztomi closed 8 years ago
Yeah, that looks like a bug.
I think in this case the best fix is to allow the user to provide bytestrings for the username and password, and if they do that to simply use the bytestring directly rather than to try to encode.
Are you interested in providing a test and patch for this?
@Lukasa Gladly, but I'm a bit overburdened at the moment. I'll have spare time in 2-3 weeks, if it's still open, I'll take a peek.
Ok cool, I'll mark this as contributor friendly and if no-one else picks it up by the time you have time you should take a swing at it.
Hello, @lukasa!
Your idea about byte strings looks very good and fully matches the white spaces in spec.
But.
There are two ways to release your idea: 1) Save user/pass in bytes. Looks not good, because in fact we always need to check type of variable, before use it. 2) Convert user/pass to strings in init(). Looks not good, because we lose the original values.
And last, I think 95% peoples will be write code like this:
u = 'Дмитрий' # my name in Russian
p = 'password'
r = request.get(url, auth=(u.encode('utf-8'), p))
To my mind, it looks not 'for humans'. Without this patch we can write this for same result:
r = request.get(url, auth=(u.encode('utf-8').decode('latin1'), p))
But we can change only one line of code:
- b64encode(('%s:%s' % (username, password)).encode('latin1')).strip()
+ b64encode(('%s:%s' % (username, password)).encode('utf-8')).strip()
After that the same code will look as:
r = request.get(url, auth=(u, p))
It looks for Humans :)
What do you think about all this?
Sorry for my grammar.
@klimenko It does look better that way, but it's unfortunately just moving the problem. Now anyone whose server is expecting a non-UTF-8 encoded username is going to get tripped up, and so we'll have to re-open this issue when someone says "my server wanted Latin1 and now doesn't get it".
It's better to use bytestrings because that way we avoid making a guess that is wrong. If the users still want the helpful automatic choice, they can pass a unicode string, but if they want to do something more specific we have an escape hatch for them.
Hi guys, I would like to take a crack at this.
@rmhasan thanks for the interest in contributing! It may be important to note that PR #3673 is already open to address this. You may want to keep an eye on the outcome of that before spending time working on a solution.
@nateprewitt I will keep an eye on it, thanks.
Resolved by #3673.
Thanks, I got past it!
Description
It is not possible to send a basic http authentication using a username or password that contains Unicode data.
What happens
UnicodeEncodeError
is thrown. Traceback:Expected behavior
The authentication is encoded as utf-8 (at least if charset=utf-8 is provided in the header).
How to reproduce
Consider the following request:
I think that the culprit is this line (https://github.com/kennethreitz/requests/blob/master/requests/auth.py#L32), which assumes latin-1 encoding regardless of the charset header:
Workaround
This seems to work:
Version info