Closed rsyring closed 2 weeks ago
We should expose Werkzeug's trusted host checking, with a new ALLOWED_HOSTS = []
config to match Django's. This is similar to what we recently did for the dev server debugger.
I don't think we need to deprecate SERVER_NAME
. If it's set, it's added to ALLOWED_HOSTS
, otherwise behavior stays the same.
it could eliminate the need for
url_for()
's_scheme
arg.
This is already covered by PREFERRED_URL_SCHEME
. If we were to deprecate SERVER_NAME
, I'd rename it to PREFERRED_URL_HOST
or something, but I think it's fine to leave it.
David, thanks for chiming in.
I think SERVER_NAME
still remains problematic:
SERVER_NAME
still affects two disparate parts of the system. If all I want to do is configure the external URL used by url_for()
, I can't do that without affecting routing or host checking. I should be able to set them separately. This becomes a practical problem in the case of...Django doesn't seem to handle the unkown IP case natively: SO discussion and https://pypi.org/project/django-allow-cidr/
I don't know much about how Flask handles subodmains but noting it here in case it should impact the discussion.
Thanks.
I'm hesitant to deprecate/rename SERVER_NAME
. Adjusting what I suggested above, if SERVER_NAME
is set and ALLOWED_HOSTS
is empty, then behavior remains the same by setting ALLOWED_HOSTS = [SERVER_NAME]
automatically. But if ALLOWED_HOSTS
is set, SERVER_NAME
is not automatically added. So you can get the new behavior by setting both config keys, but nothing changes if you keep your config the same. Does that sound reasonable?
That would at least facilitate my objective of setting the external URL without affecting ALLOWED_HOSTS
as, presumably, I can use ALLOWED_HOSTS = ['*']
.
But I'd point out that someone who just wants to set SERVER_NAME
for the external URL is also getting a completely separate functionality applied, by default. In a way that is implicit rather than explicit. They then have to go explicitly "undo" that implicit application of allowed hosts. At the very least, the documentation should be really clear about how SERVER_URL
impacts ALLOWED_HOSTS
and how to undo that if desired.
PREFERRED_URL_SCHEME
also seems problematic TBH. Unless I'm misunderstanding, it's a workaround for SERVER_NAME
doing double-duty. If we had two separate settings, then one should just be EXTERNAL_HOST_URL
(or whatever you want to name it) and then you don't need PREFERRED_URL_SCHEME
?
Creating two separate configs, one for each purpose, still makes the most sense to me. There is already consensus in #998 that the one setting is problematic. Why not fix the root cause?
Regardless, I'd be happy for any way to fix this in the Flask config as the workaround I use is ugly.
Thanks. :)
Regarding a combined single config key:
We have three configs that affect routing and URL generation.
PREFERRED_URL_SCHEME
defaults to http
.SERVER_NAME
is unset.APPLICATION_ROOT
defaults to /
.All three are needed to generate URLs outside a request. APPLICATION_ROOT
is needed during requests as well, to know what part of the path is the prefix from the server and what part is handled by the app.
We could have a DEFAULT_BASE_URL
like https://example.com:56374/root
. The problem is APPLICATION_ROOT
, for WSGI there's no way to know when app setup is finished and when requests handling will start, so there's no way to say "now we know all config has been loaded, set APPLICATION_ROOT
from DEFAULT_BASE_URL
if needed, then pass the app on to start serving". We would want to split it during config though, otherwise we'd need to call urlsplit(DEFAULT_BASE_URL).path
for every request, which would be expensive. The other parts only need to be split when generating URLs outside a request, so that expense isn't important then.
To recap, with the ALLOWED_HOSTS
proposal, here are the configuration items that are present in this discussion:
PREFERRED_URL_SCHEME
SERVER_NAME
ALLOWED_HOSTS
APPLICATION_ROOT
ALLOWED_HOSTS
['*']
SERVER_NAME
As a thought experiment, if starting from scratch, it seems more intuitive to me if this was broken out as:
HOST_URL
: it's purpose is to impact request routing, although reasonable defaults can be inferred for non-routing functionality
/
which would keep the current Flask behavior for a non-configured application/
or /foo
) is used for what is now APPLICATION_ROOT
https://www.foo.com/bar
and impacts:www.foo.name
is the base if a subdomain is given by a route or blueprint. In most typical cases, the app would likely use https://foo.name/
if using subdomains.HOST_URL_EXTERNAL
ALLOWED_HOSTS
? IMO, this is determined by whether or not we want to encourage it being used for security reasons.HOST_URL_EXTERNAL
(or maybe APPLICATION_URL
): it's only purpose is external URL generation
None
, must be a full URL, https://www.baz.com/bar
HOST_URL
isn't to configure only how external URLs are generated and not impact request routingHOST_URL
is not accurate. As in the case that started this issue for me where the internal host name is different from the external URL.HOSTS_ALLOWED
: same as ALLOWED_HOSTS
but, by naming, keeps these three configs named similar since they are related(ish)
HOST_URL
impacts the default of this config item.We obviously are not starting from scratch, but that is still, IMO, the way the functionality most obviously groups together when thinking through the configuration.
From an implementation perspective, I don't know if it's better to keep all the old settings as individual config items or deprecate them. I'd likely lean in favor of the latter since I think the current ones grew over time into a bit of an ugly knot. But, if not wanting to deprecate as you've seemed to indicate, then you could arguably split HOST_URL
into the various components and set the existing config values from it. Although, in that case, I'd adjust external URL generation to solely pull from HOST_URL_EXTERNAL
instead of being pieced together from the configuration items that should only affect request routing, IMO.
Could also not set any defaults from HOST_URL
and make everything more explicit. I think that just depends on how Flask leans in terms of "user friendly" vs "explicit is better than implicit." Given that it's very easy to do:
HOST_URL = 'https://foo.bar/baz'
app = Flask()
app.config['HOST_URL'] = HOST_URL
app.config['HOST_URL_EXTERNAL'] = HOST_URL
app.config['HOSTS_ALLOWED'] = ['foo.bar']
And that doesn't require any parsing of the URL to get the bare host for HOSTS_ALLOWED
, I'd lean towards everything being explicit.
We could have a DEFAULT_BASE_URL like https://example.com:56374/root. The problem is APPLICATION_ROOT, for WSGI there's no way to know when app setup is finished and when requests handling will start...
Two options I see:
HOST_URL
any time it's set and distribute the values to other configuration items immediately. It would be a special handling of that one value which may or may not be that big of a deal.Flask
instead of a configuration value, just like some of the other "core" params that get set that way. Parse it out during init and apply to the configuration as defaults.Thanks for the analysis and writeup, I was trying to do something similar and ran out of steam. I'll think about the exact path forward, but I'm marking it for 3.1.
I'm struggling to figure out why Werkzeug's Map.bind_to_environ
handles server_name
the way it does.
Note that because of limitations in the protocol there is no way to get the current subdomain and real
server_name
from the environment. If you don’t provide it, Werkzeug will useSERVER_NAME
andSERVER_PORT
(orHTTP_HOST
if provided) as usedserver_name
with disabled subdomain feature.If
subdomain
isNone
but an environment and a server name is provided it will calculate the current subdomain automatically. Example:server_name
is'example.com'
and theSERVER_NAME
in the wsgienviron
is'staging.dev.example.com'
the calculated subdomain will be'staging.dev'
.
The end of the first paragraph has a confusing grammatical typo ("will use ... as used"), so I can't be sure of the intention. Not sure what "as" was supposed to say.
Looking at the way Flask calls this, subdomain
is always None
, and host_matching
is not used by default. So if SERVER_NAME
is set, it will always be required to match, otherwise a warning is shown and a 404 is returned. The quick answer here is to never pass server_name
, but why is that even a behavior? Why is that not handled by allowed_hosts
later on?
It seems to be saying "if the subdomain wasn't given, try to get it from the host header as a prefix of the name the WSGI server is bound to", which does make sense. But also "if I can't get a prefix, then don't even try to route", but routing would still work fine if it assumed the empty prefix, as if the host exactly matched the server's name.
There's also the subdomain_matching
parameter to Flask
, which defaults to False
. And the app.url_map.default_subdomain
value, which is unset by default. For some reason, Werkzeug's Map
doesn't account for default_subdomain
in bind_to_environ
, only bind
. And Flask sets subdomain = self.url_map.default_subdomain or None
, so only if it's truthy. But ""
as the default subdomain totally works. If I remove the or None
and set url_map.default_subdomain = '"
and SERVER_NAME = "abc.localhost:5000
, URL generation works and any domain or IP works. It's still a problem that there's this many parts, many undocumented, but it can work today.
from flask import Flask, request, url_for
app = Flask(__name__)
app.config["SERVER_NAME"] = "example.com:5000"
app.url_map.default_subdomain = "" # and patch out the `or None`
@app.get("/")
def index() -> str:
return url_for("index", _external=True)
SERVER_NAME
does have a routing purpose in this case, without interfering: it allows determining the subdomain by comparing Host
to SERVER_NAME
. For example, if Host = "a.b.example.com"
and SERVER_NAME = "example.com"
, remove SERVER_NAME
as a suffix and you get subdomain = "a.b"
. If you wanted to route beyond that, you'd either need to enable full host_matching
instead of subdomain matching, and/or redirect alias domains to the configured default name.
Yeah, I'm starting to see how this makes sense now. I think the entire problem is the or None
here: https://github.com/pallets/flask/blob/07c7d5730a2685ef2281cc635e289685e5c3d478/src/flask/app.py#L442-L451
If that's fixed, then:
subdomain_matching
and host_matching
disabled (the default), all hosts are accepted and don't affect routing.SERVER_NAME
doesn't affect routing unless it's a suffix of the host for determining the subdomain.subdomain_matching
is enabled, it makes sense that hosts that didn't have SERVER_NAME
as a suffix wouldn't route, there's no subdomain to determine.
default_subdomain
in that case, although the correctness of that is debatable.host_matching
is enabled, the subdomain doesn't matter and none of this applies anyway.So I think Flask.__init__
needs:
if not subdomain_matching:
url_map.default_subdomain = ""
And Flask.create_url_adapter
removes the or None
. And we also expose a new ALLOWED_HOSTS
config so that we can restrict the hosts still.
Longer term, Werkzeug 3.2 can add Map.subdomain_matching
and adjust the behavior of Map.bind_to_environ
a bit. But 3.1 just came out, so that will have to wait a bit.
The change was slightly different than what I guessed above, but after some more testing, I'm convinced that's the right fix now. It seems to be what was intended in #2635. This whole thing, host_matching
, subdomain_matching
, and SERVER_NAME
is really not easy to keep track of, and I clearly was misinterpreting it over the years.
It's not helped by the number of configurations in Werkzeug's routing:
host_matching
and subdomain_matching
can both be enabled (subdomain_matching
is implemented in Flask, not Werkzeug), but host_matching
takes priority in MapAdapter
.subdomain
to Map.bind
when host_matching
is enabled is an error. So it doesn't matter that host_matching
takes priority in MapAdapter
.Rule
(from @route
or add_url_rule
) can take both a subdomain
and host
pattern. Neither is used unless the corresponding x_matching
is enabled, so they just invisibly don't work if you don't remember to set it in both places, or if you set the wrong one. And again, host_matching
takes priority in case both are set, again invisibly.I'm not clear what of this can be fixed or warned about, or even if it should be. Something to think about when moving subdomain_matching
out of Flask and into Werkzeug 3.2 eventually.
998 was a discussion regarding the impact the
SERVER_NAME
setting has on an application. IMO, there was consensus in that issue, including by @mitsuhiko, thatSERVER_NAME
being used both in routing and in URL generation was problematic.That issue was closed by https://github.com/pallets/flask/pull/2635, which made a change regarding subdomain matching. However, IMO, it did not address the fundamental problem of
SERVER_NAME
impacting two disparate mechanisms in Flask.I found two additional issues since #998 that have attempted to point out the problem with
SERVER_NAME
: #2813, #3465In #2813, a minimal, complete, and verifiable example was requested and I have therefore prepared an example gist that demonstrates the problem.
The gist's third "currently impossible" test demonstrates that it's currently impossible to:
SERVER_NAME
so external URL generation works outside a request context with only an app context; andMy particular use case currently is that I need
SERVER_NAME
set so external URLs will be created accurately in distributed tasks where no web request is present. Additionally, I need the app to respond to multiple host names 1) the canonical URL the public uses; and 2) the IP address based URL our load balancer uses for health checks.IMO,
SERVER_NAME
should be deprecated and two new settings should be created. One which affects external URL generation and a second that affects routing resolution.The setting for the URL generation could be
CANONICAL_URL
and, because its a URL, it could eliminate the need forurl_for()
's_scheme
arg.Thanks for your consideration.