pallets / flask

The Python micro framework for building web applications.
https://flask.palletsprojects.com
BSD 3-Clause "New" or "Revised" License
68.2k stars 16.25k forks source link

Revisit SERVER_NAME's impact on routing and external URL generation #5553

Closed rsyring closed 2 weeks ago

rsyring commented 3 months ago

998 was a discussion regarding the impact the SERVER_NAME setting has on an application. IMO, there was consensus in that issue, including by @mitsuhiko, that SERVER_NAME being used both in routing and in URL generation was problematic.

That issue was closed by https://github.com/pallets/flask/pull/2635, which made a change regarding subdomain matching. However, IMO, it did not address the fundamental problem of SERVER_NAME impacting two disparate mechanisms in Flask.

I found two additional issues since #998 that have attempted to point out the problem with SERVER_NAME: #2813, #3465

In #2813, a minimal, complete, and verifiable example was requested and I have therefore prepared an example gist that demonstrates the problem.

The gist's third "currently impossible" test demonstrates that it's currently impossible to:

My particular use case currently is that I need SERVER_NAME set so external URLs will be created accurately in distributed tasks where no web request is present. Additionally, I need the app to respond to multiple host names 1) the canonical URL the public uses; and 2) the IP address based URL our load balancer uses for health checks.

IMO, SERVER_NAME should be deprecated and two new settings should be created. One which affects external URL generation and a second that affects routing resolution.

The setting for the URL generation could be CANONICAL_URL and, because its a URL, it could eliminate the need for url_for()'s _scheme arg.

Thanks for your consideration.

davidism commented 3 months ago

We should expose Werkzeug's trusted host checking, with a new ALLOWED_HOSTS = [] config to match Django's. This is similar to what we recently did for the dev server debugger.

I don't think we need to deprecate SERVER_NAME. If it's set, it's added to ALLOWED_HOSTS, otherwise behavior stays the same.

it could eliminate the need for url_for()'s _scheme arg.

This is already covered by PREFERRED_URL_SCHEME. If we were to deprecate SERVER_NAME, I'd rename it to PREFERRED_URL_HOST or something, but I think it's fine to leave it.

rsyring commented 3 months ago

David, thanks for chiming in.

I think SERVER_NAME still remains problematic:

  1. SERVER_NAME still affects two disparate parts of the system. If all I want to do is configure the external URL used by url_for(), I can't do that without affecting routing or host checking. I should be able to set them separately. This becomes a practical problem in the case of...
  2. What if I don't know all of the hosts that I want to allow? Maybe I'm serving up marketing content on multiple domains and non-technical people are creating names for SSO reasons. Maybe, as is my case, the load balancer is going to use different IPs every time I deploy the app (aws copilot/cloud formation). Or, if I have multiple processes/containers running, there will be multiple IPs that need to be valid but I don't know what they are.

Django doesn't seem to handle the unkown IP case natively: SO discussion and https://pypi.org/project/django-allow-cidr/

I don't know much about how Flask handles subodmains but noting it here in case it should impact the discussion.

Thanks.

davidism commented 4 weeks ago

I'm hesitant to deprecate/rename SERVER_NAME. Adjusting what I suggested above, if SERVER_NAME is set and ALLOWED_HOSTS is empty, then behavior remains the same by setting ALLOWED_HOSTS = [SERVER_NAME] automatically. But if ALLOWED_HOSTS is set, SERVER_NAME is not automatically added. So you can get the new behavior by setting both config keys, but nothing changes if you keep your config the same. Does that sound reasonable?

rsyring commented 3 weeks ago

That would at least facilitate my objective of setting the external URL without affecting ALLOWED_HOSTS as, presumably, I can use ALLOWED_HOSTS = ['*'].

But I'd point out that someone who just wants to set SERVER_NAME for the external URL is also getting a completely separate functionality applied, by default. In a way that is implicit rather than explicit. They then have to go explicitly "undo" that implicit application of allowed hosts. At the very least, the documentation should be really clear about how SERVER_URL impacts ALLOWED_HOSTS and how to undo that if desired.

PREFERRED_URL_SCHEME also seems problematic TBH. Unless I'm misunderstanding, it's a workaround for SERVER_NAME doing double-duty. If we had two separate settings, then one should just be EXTERNAL_HOST_URL (or whatever you want to name it) and then you don't need PREFERRED_URL_SCHEME?

Creating two separate configs, one for each purpose, still makes the most sense to me. There is already consensus in #998 that the one setting is problematic. Why not fix the root cause?

Regardless, I'd be happy for any way to fix this in the Flask config as the workaround I use is ugly.

Thanks. :)

davidism commented 3 weeks ago

Regarding a combined single config key:

We have three configs that affect routing and URL generation.

All three are needed to generate URLs outside a request. APPLICATION_ROOT is needed during requests as well, to know what part of the path is the prefix from the server and what part is handled by the app.

We could have a DEFAULT_BASE_URL like https://example.com:56374/root. The problem is APPLICATION_ROOT, for WSGI there's no way to know when app setup is finished and when requests handling will start, so there's no way to say "now we know all config has been loaded, set APPLICATION_ROOT from DEFAULT_BASE_URL if needed, then pass the app on to start serving". We would want to split it during config though, otherwise we'd need to call urlsplit(DEFAULT_BASE_URL).path for every request, which would be expensive. The other parts only need to be split when generating URLs outside a request, so that expense isn't important then.

rsyring commented 3 weeks ago

To recap, with the ALLOWED_HOSTS proposal, here are the configuration items that are present in this discussion:

As a thought experiment, if starting from scratch, it seems more intuitive to me if this was broken out as:

We obviously are not starting from scratch, but that is still, IMO, the way the functionality most obviously groups together when thinking through the configuration.

From an implementation perspective, I don't know if it's better to keep all the old settings as individual config items or deprecate them. I'd likely lean in favor of the latter since I think the current ones grew over time into a bit of an ugly knot. But, if not wanting to deprecate as you've seemed to indicate, then you could arguably split HOST_URL into the various components and set the existing config values from it. Although, in that case, I'd adjust external URL generation to solely pull from HOST_URL_EXTERNAL instead of being pieced together from the configuration items that should only affect request routing, IMO.

Could also not set any defaults from HOST_URL and make everything more explicit. I think that just depends on how Flask leans in terms of "user friendly" vs "explicit is better than implicit." Given that it's very easy to do:

HOST_URL = 'https://foo.bar/baz'
app = Flask()
app.config['HOST_URL'] = HOST_URL
app.config['HOST_URL_EXTERNAL'] = HOST_URL
app.config['HOSTS_ALLOWED'] = ['foo.bar']

And that doesn't require any parsing of the URL to get the bare host for HOSTS_ALLOWED, I'd lean towards everything being explicit.

We could have a DEFAULT_BASE_URL like https://example.com:56374/root. The problem is APPLICATION_ROOT, for WSGI there's no way to know when app setup is finished and when requests handling will start...

Two options I see:

davidism commented 3 weeks ago

Thanks for the analysis and writeup, I was trying to do something similar and ran out of steam. I'll think about the exact path forward, but I'm marking it for 3.1.

davidism commented 2 weeks ago

I'm struggling to figure out why Werkzeug's Map.bind_to_environ handles server_name the way it does.

Note that because of limitations in the protocol there is no way to get the current subdomain and real server_name from the environment. If you don’t provide it, Werkzeug will use SERVER_NAME and SERVER_PORT (or HTTP_HOST if provided) as used server_name with disabled subdomain feature.

If subdomain is None but an environment and a server name is provided it will calculate the current subdomain automatically. Example: server_name is 'example.com' and the SERVER_NAME in the wsgi environ is 'staging.dev.example.com' the calculated subdomain will be 'staging.dev'.

The end of the first paragraph has a confusing grammatical typo ("will use ... as used"), so I can't be sure of the intention. Not sure what "as" was supposed to say.

https://github.com/pallets/werkzeug/blob/7868bef5d978093a8baa0784464ebe5d775ae92a/src/werkzeug/routing/map.py#L321-L339

Looking at the way Flask calls this, subdomain is always None, and host_matching is not used by default. So if SERVER_NAME is set, it will always be required to match, otherwise a warning is shown and a 404 is returned. The quick answer here is to never pass server_name, but why is that even a behavior? Why is that not handled by allowed_hosts later on?

It seems to be saying "if the subdomain wasn't given, try to get it from the host header as a prefix of the name the WSGI server is bound to", which does make sense. But also "if I can't get a prefix, then don't even try to route", but routing would still work fine if it assumed the empty prefix, as if the host exactly matched the server's name.

davidism commented 2 weeks ago

There's also the subdomain_matching parameter to Flask, which defaults to False. And the app.url_map.default_subdomain value, which is unset by default. For some reason, Werkzeug's Map doesn't account for default_subdomain in bind_to_environ, only bind. And Flask sets subdomain = self.url_map.default_subdomain or None, so only if it's truthy. But "" as the default subdomain totally works. If I remove the or None and set url_map.default_subdomain = '" and SERVER_NAME = "abc.localhost:5000, URL generation works and any domain or IP works. It's still a problem that there's this many parts, many undocumented, but it can work today.

from flask import Flask, request, url_for

app = Flask(__name__)
app.config["SERVER_NAME"] = "example.com:5000"
app.url_map.default_subdomain = ""  # and patch out the `or None`

@app.get("/")
def index() -> str:
    return url_for("index", _external=True)
davidism commented 2 weeks ago

SERVER_NAME does have a routing purpose in this case, without interfering: it allows determining the subdomain by comparing Host to SERVER_NAME. For example, if Host = "a.b.example.com" and SERVER_NAME = "example.com", remove SERVER_NAME as a suffix and you get subdomain = "a.b". If you wanted to route beyond that, you'd either need to enable full host_matching instead of subdomain matching, and/or redirect alias domains to the configured default name.

davidism commented 2 weeks ago

Yeah, I'm starting to see how this makes sense now. I think the entire problem is the or None here: https://github.com/pallets/flask/blob/07c7d5730a2685ef2281cc635e289685e5c3d478/src/flask/app.py#L442-L451

If that's fixed, then:

So I think Flask.__init__ needs:

if not subdomain_matching:
    url_map.default_subdomain = ""

And Flask.create_url_adapter removes the or None. And we also expose a new ALLOWED_HOSTS config so that we can restrict the hosts still.

Longer term, Werkzeug 3.2 can add Map.subdomain_matching and adjust the behavior of Map.bind_to_environ a bit. But 3.1 just came out, so that will have to wait a bit.

davidism commented 2 weeks ago

The change was slightly different than what I guessed above, but after some more testing, I'm convinced that's the right fix now. It seems to be what was intended in #2635. This whole thing, host_matching, subdomain_matching, and SERVER_NAME is really not easy to keep track of, and I clearly was misinterpreting it over the years.

It's not helped by the number of configurations in Werkzeug's routing:

I'm not clear what of this can be fixed or warned about, or even if it should be. Something to think about when moving subdomain_matching out of Flask and into Werkzeug 3.2 eventually.