Closed tsibley closed 2 years ago
Have you tried this without the ProxyPassReverse
directive? I'm worried that might be confusing Datasette.
This is the test I used to ensure this feature works - it scrapes all of the links on a bunch of different pages. Could it be missing something here?
Hmm, I haven't tried removing ProxyPassReverse
, but it doesn't touch the HTML, which is the issue I'm seeing. You can read the documentation here. ProxyPassReverse
is a standard directive when proxying with Apache. I've used it dozens of times with other applications.
Looking a little more at the code, I think the issue here is that the behaviour of base_url
makes sense when Datasette is mounted at a path within a larger application, but not when HTTP requests are being proxied to it.
In a mount situation, it is perfectly fine to construct URLs reusing the domain and path from the request. In a proxy situation, it never is, as the domain and path in the request are not the domain and path that the non-proxy client actually needs to use. That is, links which include the Apache → Datasette request origin, localhost:8001
, instead of the browser → Apache request origin, example.com
, will be broken.
The tests you pointed to also reflect this in two ways:
They strip a leading http://localhost
, allowing such URLs in the facet links to pass, but inclusion of that in a proxy situation would mean the URL is broken.
The test client emits direct ASGI events instead of actual proxied HTTP requests. The headers of these ASGI events don't reflect the way an HTTP proxy works; instead they pass through the original request path which contains base_url
. This works because Datasette responds to requests equivalently at either /…
or /{base_url}/…
, which makes some sense in a mount situation but is unconventional (albeit workable) for a proxied app.
Apps that support being proxied automatically support being mounted, but apps that only support being mounted don't automatically support being proxied.
I also am seeing the same issue with an Apache setup (same even w/o ProxyPassReverse
, though I typically use it as @tsibley stated).
But also want to say thanks for a great tool (this issue not withstanding)!
But also want to say thanks for a great tool
+1!
Tracking ticket: #1023
OK, I've made a ton of improvements to how the base_url
setting works - see tickets linked from #1023. I've just pushed out an alpha release with those changes in it: https://github.com/simonw/datasette/releases/tag/0.51a0
@tsibley @tballison @ChristopherWilks I'd really appreciate your help testing this alpha!
You can install it with:
pip install datasette==0.51a0
It should work with just ProxyPass
, without needing the ProxyPassReverse
setting.
I'm trying to run something behind a MyBinder proxy, but seem to have something set up incorrectly and not sure what the fix is?
I'm starting datasette with jupyter-server-proxy setup:
# __init__.py
def setup_nbsearch():
return {
"command": [
"datasette",
"serve",
f"{_NBSEARCH_DB_PATH}",
"-p",
"{port}",
"--config",
"base_url:{base_url}nbsearch/"
],
"absolute_url": True,
# The following needs a the labextension installing.
# eg in postBuild: jupyter labextension install jupyterlab-server-proxy
"launcher_entry": {
"enabled": True,
"title": "nbsearch",
},
}
where the base_url
gets automatically populated by the server-proxy. I define the loaders as:
# __init__.py
from datasette import hookimpl
@hookimpl
def extra_css_urls(database, table, columns, view_name, datasette):
return [
"/-/static-plugins/nbsearch/prism.css",
"/-/static-plugins/nbsearch/nbsearch.css",
]
but these seem to also need a base_url prefix set somehow?
Currently, the generated HTML loads properly but internal links are incorrect; eg they take the form <link rel="stylesheet" href="/-/static-plugins/nbsearch/prism.css">
which resolves to eg https://notebooks.gesis.org/hub/-/static-plugins/nbsearch/prism.css
rather than required URL of form https://notebooks.gesis.org/binder/jupyter/user/ouseful-testing-nbsearch-0fx1mx67/nbsearch/-/static-plugins/nbsearch/prism.css
.
The main css is loaded correctly: <link rel="stylesheet" href="/binder/jupyter/user/ouseful-testing-nbsearch-0fx1mx67/nbsearch/-/static/app.css?404439">
OK, this should be working now. You can use the datasette.urls.static_plugins()
method to generate the correct URLs in the extra_css_urls
plugin hook: https://docs.datasette.io/en/latest/internals.html#datasette-urls
Thanks; just a note that the datasette.urls.static(path)
and datasette.urls.static_plugins(plugin_name, path)
items both seem to be repeated and appear in the docs twice?
@simonw Unfortunately this issue as I reported it is not actually solved in version 0.55.
Every link which is returned by the Datasette.absolute_url
method is still wrong, because it uses the request URL as the base. This still includes the suggested facet links and pagination links.
What I wrote originally still stands:
Although many of the URLs in the pages are correct (presumably because they either use absolute paths which include
base_url
or relative paths), the faceting and pagination links still use fully-qualified URLs pointing athttp://localhost:8001
.I looked into this a little in the source code, and it seems to be an issue anywhere
request.url
orrequest.path
is used, as these contain the values for the request between the frontend (Apache) and backend (Datasette) server. Those properties are primarily used via thepath_with_…
family of utility functions and theDatasette.absolute_url
method.
Would you prefer to re-open this issue or have me create a new one?
Let's reopen this.
The biggest challenge here I think is to replicate the exact situation here this happens in a Python unit test. The fix should be easy once we have a test in place.
Nod. The problem with the tests is that they're ignoring the origin (hostname, port) of links. In a reverse proxy situation, the frontend request origin is different than the backend request origin. The problem is Datasette generates links with the backend request origin.
I think this could be solved by one of:
base_url
into account (as they do now, yay!).base_url
to include the expected frontend origin, and then use that information when generating absolute URLs.X-Forwarded-*
family of conventional headers) to pass the frontend origin information to Datasette, and then use that information when generating absolute URLs.Option 1 seems like the easiest to me, if you can get away with never having to generate an absolute URL.
I'm running
datasette serve --config base_url:/foo/ …
, proxying to it with this Apache config:and then accessing it via
https://example.com/foo/
.Although many of the URLs in the pages are correct (presumably because they either use absolute paths which include
base_url
or relative paths), the faceting and pagination links still use fully-qualified URLs pointing athttp://localhost:8001
.I looked into this a little in the source code, and it seems to be an issue anywhere
request.url
orrequest.path
is used, as these contain the values for the request between the frontend (Apache) and backend (Datasette) server. Those properties are primarily used via thepath_with_…
family of utility functions and theDatasette.absolute_url
method.