feat: url_path(hostname: String) query

dopry commented 1 year ago

Headless builders typically represent a single site and they know which site that is. Currently headless sites need to parse url_path to stripe out the 'site path' from a given set of pages. To simplify URL construction for headless sites it would be nice if they could fetch url_paths relative to a given site.

Given hostname is not provided,
- Then the path relative to default site is returned.
Given hostname is provided,
- When a site exists for the given hostname
- When the page is the Root Page
  - Then '/' is returned (this is to distinguish from the empty string returned when site is not matched)
- When the page is not in the given site
  - Then an empty string is returned
- Then the path relative to the site matching the hostname is returned
- When a site does not exist for the given hostname
  - Then an empty string is returned

zerolab commented 1 year ago

Drive-by note: what about something like

{
    site(hostname: "my.domain") {
        pages(urlPath: $path) {
            title
        }
    }
}

https://wagtail-grapple.readthedocs.io/en/latest/general-usage/graphql-types.html#siteobjecttype

This is off the top of my head, while in-between tasks

dopry commented 1 year ago

@zerolab so based on you suggestion the PageInterface.url_path resolver would vary the resolved urlPath based on whether it was a child of Site or not?

dopry commented 1 year ago

@zerolab I figured out why I've been getting None for urls... when you're running headless and don't mount the wagtail urls page.url will always return None, see: https://github.com/wagtail/wagtail/pull/9984 I found a workaround to mount the urls in an unreachable location.

urlpatterns = [
    path("admin/", admin.site.urls),
    # the order of grapple_urls and wagtailadmin_urls is important here. When initially implementing
    # if wagtailadmin_urls were first, then the /graphql and /graphiql endpoints were unreachable.
    path("", include(grapple_urls)),
    path("", include(wagtailadmin_urls)),
    path("documents/", include(wagtaildocs_urls)),
    ## the following lambda is meand to be a catch-all to make the wagtail_urls inaccessible
    path("", lambda: HttpResponse("404 Not Found", status=404)),
    ## load wagtail_urls so that wagtail_serve is mounted so RichTextBlock url generation works
    ## properly. This is a work around for https://github.com/wagtail/wagtail/pull/9984
    path("", include(wagtail_urls)),
]

Once I got this setup I did a basic multisite experiment

{
  site(hostname: "default.com") {
    hostname
    page(id: 66) {
      title
      url
      urlPath
    }
  }
   page(id: 66) {
      title
      url
      urlPath
    }
}

and got the response

{
  "data": {
    "site": {
      "hostname": "default.com",
      "page": {
        "title": "Page Test",
        "url": "http://test.com/page-test/",
        "urlPath": "/default/test/page-test/"
      }
    },
    "page": {
      "title": "Page Test",
      "url": "http://test.com/page-test/",
      "urlPath": "/default/test/page-test/"
    }
  }
}

It looks like we need to update site(hostname: "") to somehow trick Site.find_for_request, and it looks like we would do that by overwriting request._wagtail_site. I'm not sure if that would work in a scenario like

{ 
  site(hostname: "default.com") { pages { url } } 
  site(hostname: "test.com") { pages { url } } 
}

There may need to be some architecture work done at the core level to divorce site resolution from the request, or we're going to need to figure out how to shim in replacements for the core components that depend on that in the grapple resolvers....

zerolab commented 1 year ago

Thank you for furthering this.

Will do a deeper dive during the weekend. The best place to fix this would be core in that to be truly headless we should not necessarily rely on wagtail_urls.

One thing (off the top of my head) is to try and make use of wagtail.models.sites.get_site_for_hostname in our site queries with some monkeypatching. Or... perhaps wagtail-headless-preview as preview is one thing you absolutely want to have working when in a headless context 🤔

dopry commented 1 year ago

I think we could also just override the url resolver in the short term to accept a hostname... but it would be nice if core query sets allowed us to specify an alternate hostname for url generation similar to what you proposed here... As it is, multisite domain resolution doesn't behave according to my expectations. In my example above the default site is default.com. I would expect that if I queried a page using the default site home page as the root of my query it would return as default site. What is actually happening is domain for the first parent to the page is being used... It's probably because I don't know the use cases that nested multisite support was built around...

seb-b commented 1 year ago

+1 for the idea to have some way to get the relative url without the base path, would really reduce a bit of our back and forth when generating static paths and fetching pages.

Could there be a new field on the page interface specifically for this? It could be site agnostic, url_path seems to work ok even when the page urls aren't registered (we don't do this either and have ran into some weird urls problems)

class PageInterface(...):
    relative_url = graphene.String()

    def resolve_relative_url(self, info, **kwargs):
       parts = [part for part in self.url_path.split("/") if part]
       if len(parts) == 1:
          return "/"
       return f{"/".join(parts[1:])}/"

dopry commented 1 year ago

Unfortunately, we can't can't strip off the first folder in a path. With multisite setups any a site/hostname can use any page as a root. We need to know the 'site' in order to figure out the relative path, hence my proposal to pass hostname as an argument.

torchbox / wagtail-grapple

feat: url_path(hostname: String) query #306