plone / volto

React-based frontend for the Plone Content Management System
https://demo.plone.org/
MIT License
426 stars 575 forks source link

p.a.redirector redirections get no redirection for crawlers (via SSR) and volto redirects to incorrect ++api++ url depending on rewrite rule setup #4800

Open djay opened 11 months ago

djay commented 11 months ago

Describe the bug

Really this is two bugs

Relevant tickets in volto https://github.com/plone/volto/issues/1095 and restapi https://github.com/plone/plone.restapi/issues/181 don't make it clear if the ++api++ is meant to be left in or not. but it makes most sense an api call should redirect to an api call.

To Reproduce

Steps to reproduce the behavior:

Out of the box plone with volto

  1. Create a page
  2. change the short name
  3. access old url using curl
  4. you will get 200 not 30x from SSR
  5. if using a browser then volto will make a content api call that will then result in a 30x redirect
  6. If you use the nginx and rewrite rules then you will get api redirect without ++api++ in it. e.g this reproduces this case
      $ curl -H "Accept: application/json" -v http://localhost:8080/VirtualHostBase/http/www.blah.com/++api++/Plone/VirtualHostRoot/test1
      ...
      < HTTP/1.1 302 Found
      ...
      < Location: http://www.blah.com:8080/test2
  7. but if no rewrite rules or internal VHM then redirect is with ++api++ e.g.
    $ curl -H "Accept: application/json" -v http://localhost:8080/++api++/Plone/test1
    ... 
    < HTTP/1.1 302 Found
    ...
    Location: http://localhost:8080/++api++/Plone/test2
  8. This url is directly put into the window.location and if you refresh you get json

It seems that volto relies on the ++api++ being missing in the redirect and if it gets a redirect from an api with ++api++ then it puts ++api++ into the browser url causing the bug

Expected behavior

proposed solution

Screenshots

You can see the redirect without ++api++ in demo.plone.org

image (4)

Software (please complete the following information):

Additional context

Add any other context about the problem here.

djay commented 11 months ago

Managed to reproduce it locally so updated the ticket.

djay commented 11 months ago

@davisagli wondering if you can review the proposed fix before we start working on it.

davisagli commented 11 months ago

@djay Thanks for taking a look at this. I think you're mostly on the right track about the proposed fixes but I'll give some commentary...

fix volto so it can handle ++api++ in the redirect

Makes sense. In general, volto is responsible for adding or removing ++api++ from the URL, depending on whether it is using the URL for something on the client-side (e.g. setting the browser location -- shouldn't have ++api++) or a call to the server (should have ++api++).

All the volto sites I've worked on use rewrites, so I was a bit surprised that not using them causes the backend to return URLs that include ++api++. But, if that is the case, then yes, volto needs to fix it before redirecting the browser.

fix restapi to ensure ++api++ is in the redirect ... An api call should redirect to an api call.

I was thinking the same thing, but then I realized that this might be tricky to implement correctly for all cases. The problem is that the backend returns some URLs that are meant to be used for subsequent calls to the backend (with ++api++) and some URLs that are meant to be used on the client side (e.g. these redirects, or things like action URLs). So we would need to come up with a good way for code in the backend to distinguish what kind of URL it's producing, and I don't think we don't have anything like that yet.

This shouldn't be strictly necessary as long as volto fixes whatever URL it has before using it, so I wouldn't prioritize this part of the fix. Don't take this as stop energy -- I'm certainly interested if you have ideas how to handle this distinction better -- I'm just trying to prioritize where to spend effort.

adjust SSR code to pass on the redirect that it gets from the api redirect

Yep, we want this, it just hasn't been implemented yet.

There's an important nuance: We want volto's server side to serve a permanent redirect (301 or 308) to logged out users, but a temporary redirect (302 or 307) to logged in users. The reason is that we need permanent redirects for SEO, but we need temporary redirects to avoid poisoning the cache of editors who are moving content around.

djay commented 11 months ago

@davisagli there is seperate restapi code to deal with redirects. It already deals with the case of putting ++API++ back in, just not in all cases so it's a bug. The only case it seems to be designed to give a non API result is if the original wasn't an API call or if the redirect is external (not sure how that is possible)

I suspect the place to put the logic around different redirect codes for logged in or not is in the restapi redirect code. Then volto just passes on whatever code it gets from the API call. Will that work?

davisagli commented 11 months ago

I suspect the place to put the logic around different redirect codes for logged in or not is in the restapi redirect code. Then volto just passes on whatever code it gets from the API call. Will that work?

I was thinking of doing it only in volto. (We don't really need this distinction at the API level, because we don't really care about SEO of JSON responses.) But I don't see a problem with doing it in the API like you said, and it does sound a bit more consistent.