Closed jab closed 1 year ago
I'm definitely ok with revisiting this. Fairly sure I've left some comments about not having Bearer support in the past.
Thanks, good to know! I should have said, this is not actually something I need at this time, since I'm handling the unsupported authentication schemes in some lightweight middleware that doesn't require Werkzeug. So I probably won't submit a PR for this anytime soon (unless this year's PyCon sprint is still happening virtually and I can make it). But I'd be happy to help review any proposed new designs or implementations if someone else wants to work on it sooner.
@jab do you have any ideas for what the API should look like? If we can figure that out, someone else might be able to pick this up and implement it.
One thing I noticed about the current API is that we have only a single type that's trying to represent all supported Authorization headers – and likewise for WWW-Authenticate – that just unions together all the properties for all the supported authentication schemes. The rest of this comment probably writes itself, but just to be explicit, we currently have something like:
class Authorization:
def username(...): # used for basic auth and digest auth
def password(...): # only used for basic auth
def nonce(...): # only used for digest auth
def nc(...): # only used for digest auth
def cnonce(...): # only used for digest auth
def response(...): # only used for digest auth
def opaque(...): # only used for digest auth
def qop(...): # only used for digest auth
def to_header(self):
if self.type == "basic":
return ...
elif self.type == "digest":
return ...
I wonder if using subclasses would improve this design, something more like:
class Authorization(ABC):
@abstractmethod
def to_header(self):
...
class BasicAuth(Authorization):
def username(...): ...
def password(...): ...
def to_header(self):
# basic auth serialization logic goes here
class DigestAuth(Authorization):
def username(...): ...
def nonce(...): ...
...
def to_header(self):
# digest auth serialization logic goes here
Especially if we're now considering adding support for more schemes.
In case it's helpful, https://github.com/deshaw/wsgi-kerberos/blob/master/wsgi_kerberos.py is some open source middleware I've worked on that implements Kerberos authentication in WSGI middleware. In that case, WWW-Authenticate headers look like Negotiate
, and Authorization headers just look like negotiate <token>
. If WSGI-Kerberos ever wanted to use Werkzeug to parse/serialize these, it seems better if Werkzeug provided a datastructure that wasn't bloated with the APIs of all the other supported authentication schemes combined.
What do you think?
Splitting it up makes sense. I would only expect the type
attribute to be common between them all, and for users to look at that (currently) for whether to use the other attributes.
We also need to parse the headers, and I'm trying to move towards having parse
in classes instead of separate http
functions. Maybe Authorization.parse
still has to do if value.startswith("Basic "): ... elif startswith(...): ...
.
To support user subclasses so we don't have to implement every possible scheme, we'd need Authorization.parse
to do something like the following:
class Authorization:
@classmethod
def parse(cls, value):
for sub in cls.__subclasses__():
if value.startswith(sub.prefix): # every class needs to define what their value starts with
return sub.parse(value)
If there's a lot of subclasses it will get a bit slower, and it requires each having a unique prefix, which I'm not 100% sure about. But I can't think of a better way to make this extensible, and I don't anticipate there being so many subclasses that it becomes a problem. And users could always override the authorization
property to use their own parse logic, for example if they know they will only ever use Kerberos.
Wonder if it's worth using __init_subclass__()
here instead. A super simplified version might look like:
from abc import ABC, abstractmethod
from base64 import b64encode, b64decode
class Authorization(ABC):
__subclass_by_scheme = {}
def __init_subclass__(cls):
super().__init_subclass__()
Authorization.__subclass_by_scheme[cls.scheme] = cls
@classmethod
@abstractmethod
def from_header(cls, header):
scheme = header.split(None, 1)[0]
return Authorization.__subclass_by_scheme[scheme].from_header(header)
@abstractmethod
def to_header(self):
raise NotImplementedError
class BasicAuth(Authorization):
scheme = "Basic"
def __init__(self, username, password):
self.username = username
self.password = password
@classmethod
def from_header(cls, header):
auth_info = header.split(None, 1)[1]
username, password = b64decode(auth_info).split(b":", 1)
return cls(username.decode("utf8"), password.decode("utf8"))
def to_header(self):
encoded = b64encode(f"{self.username}:{self.password}".encode("utf8")).decode("utf8")
return f"Basic {encoded}"
>>> basic = Authorization.from_header("Basic dXNlcm5hbWU6cGFzc3dvcmQ=")
>>> basic
<auth.BasicAuth object at 0x1096bc0d0>
>>> basic.to_header()
'Basic dXNlcm5hbWU6cGFzc3dvcmQ='
Oh nice, I forgot they made it possible to do registration without metaclasses, and it's 3.6+.
+1 on this improvement. I tried to extend WWWAuthenticate to support the bearer challenge but it was not straightforward to plug in restrictions on character ranges as defined in https://datatracker.ietf.org/doc/html/rfc6750#section-3.
Values for the "scope" attribute (specified in Appendix A.4 of [RFC6749]) MUST NOT include characters outside the set %x21 / %x23-5B / %x5D-7E for representing scope values and %x20 for delimiters between scope values. Values for the "error" and "error_description" attributes (specified in Appendixes A.7 and A.8 of [RFC6749]) MUST NOT include characters outside the set %x20-21 / %x23-5B / %x5D-7E. Values for the "error_uri" attribute (specified in Appendix A.9 of [RFC6749]) MUST conform to the URI-reference syntax and thus MUST NOT include characters outside the set %x21 / %x23-5B / %x5D-7E.
For example, in response to a protected resource request without authentication:
HTTP/1.1 401 Unauthorized WWW-Authenticate: Bearer realm="example"
And in response to a protected resource request with an authentication attempt using an expired access token:
HTTP/1.1 401 Unauthorized WWW-Authenticate: Bearer realm="example", error="invalid_token", error_description="The access token expired"
I'm working on this and trying to figure out how to make it nice for type checking. Haven't come up with any good ideas yet. The problem is that we need to parse a string and return a specific subclass, but the string itself is not typing information. So MyPy can't know that Authorization.parse_header("basic asdlkfjadsa")
will return a BasicAuth
instance. And there's no good way to make request.authorization
more specific either. I think users will just have to check the type, this should satisfy MyPy:
auth = request.authorization
if isinstance(auth, BasicAuth):
return get_user(auth.username, auth.password)
else:
raise Unauthorized(www_authenticate=WWWAuthenticate("basic"))
I've been working on this all day, and had a pretty good interface for different Authorization
and WWWAuthenticate
subclasses, but now I'm wondering if that's actually a good idea. I wonder if we should go the opposite direction, and make the whole thing less specialized.
In the new system I have, plain Authorization
instances don't exist anymore. You (or Werkzeug) call the Authorization.from_header
class method, and get back a specific subclass like BasicAuth
based on the scheme in the header value. There are various compatibilities in place to deprecate old behavior like checking auth.type
or accessing digest params if it's actually basic, etc.
However, this requires defining a different subclass for each auth scheme you might encounter. A client might send a scheme you know nothing about. Should you ignore it? Raise a 400 error? Or maybe you actually do know what it is, but didn't write a whole class for it, because the current system still lets you access auth.type
and all the parameters using auth[key]
.
I think what we should actually do is deprecate all the attributes, and only use the dict interface. We can add an extra attribute called token
for schemes like Bearer
which only send a token and not parameters; basic auth also does this but we can special case that since we know what the token represents. Given that Werkzeug doesn't actually do anything with auth, I think we should just expose the parsed header and let the user or an extension take care of doing anything with that data. The same goes for the WWWAuthenticate
class.
Just for reference, here's what I was working on to split into multiple specialized classes, but I didn't end up going this direction.
Thanks for reviewing #1755, @davidism. Now that that's merged, I'm following up as promised to pose whether we should make any changes to
werkzeug.auth.WWWAuthenticateMixin
/werkzeug.datastructures.WWWAuthenticate
, to improve Werkzeug's support for authentication.It looks like the current APIs were designed for a simpler time:
WWWAuthenticate.type
even says "HTTP currently specifiesBasic
andDigest
", which is ~10 years out of date (the spec now points to the IANA Authentication Scheme Registry, which lists several more schemes, and in particularBearer
andNegotiate
have been quite common for a while now).On the other hand, it may be telling that no one has asked about this until now. Please feel free to close this issue without further discussion if there is too little evidence of demand for these changes at this time.