plone / plone.base

Plone Interface contracts, plus basic features and utilities
https://pypi.org/project/plone.base
2 stars 0 forks source link

Fix folder_contents being broken with VHM and Multilingual #18

Closed Rudd-O closed 2 years ago

Rudd-O commented 2 years ago

When Multilingual is enabled and VHM is enabled too, serving any folderish's folder_contents under a language folder as a top level domain name is currently broken (if en.site.com/archives is VHMed to PloneSite/en/archives then @@qsOptions URL is /archives/@@qsOptions instead of being /@@qsOptions as it should be.

The fundamental problem is that the function get_top_site_from_url(...) is failing to retrieve the actual top site because it can't find a site in the traversal structure, so it picks the last folderish as the site, which is obvs broken because /@@qsOptions only works on site roots, not on folderish content.

This code makes it so that any site found in the structure is returned as the top site from the URL, but if no top site is found in that traversal, the actual site from getSite() is returned instead, which is the correct behavior based on how the function is named, and actually fixes the /@@qsOptions conundrum.

Reproducer

Odd intersection of circumstances to reproduce bug:

  1. Create classic Plone site.
  2. Enable Multilingual with en-US or some other language.
  3. Verify visiting /Plone redirects to Plone/en-US or your language folder of choice.
  4. VHM your site so that /Plone/en-US is accessible as / (a standard, supported Multilingual rewrite configuration typically used by Ploners for language-code subdomain hosting). Plone/en-US/VirtualHostRoot is the relevant VHM rewrite.
  5. Visit the site. Verify that site is accessible as /.
  6. Create folder xxx on that location.
  7. Visit folder xxx on that location.
  8. Click on Folder contents in the editor toolbar.
  9. Verify that the folder contents screen for xxx loads, but the actual listing remains empty.
  10. Verify that there is an HTTP 404 for @@qsOptions in the network panel of your browser inspector.

Rationale

This code makes the method sometimes return a content object in some cases, rather than an actual site root.

Why?

All callsites (see below) need a "root object" from the perspective of the browser to compute a root and generate various URLs for their own purposes, because all their purposes are XHR browser stuff (fill lists of objects and search results, chiefly). This method I'm fixing determines the "root" used for that tree which is used in related items, image insertion, and folder contents browsing, and if that "root" is incorrect / does not correspond to what the browser sees as "root", nothing works.

Needless to say, this method should never be used to traverse anything server-side.

Quality control

Callsites

My omelette is big. There are not that many uses, and all its uses currently broken are fixed by this patch. I count two uses:

  1. plone.app.content.browser.contents:ContentsBaseAction / get_options / ContextInfo.call
  2. plone.app.widgets:get_relateditems_options

Unit and regression testing

All current unit tests pass now. I have added a test case to cover the "root URL object is not actually a site" case, which was never tested before. In addition to that, I have tested this with a local build (most up to date code) and here is what I can verify:

So pretty much everything that was broken before, is no longer broken thanks to this modest change. And that is the complete manual regression test of all users of this code.

mister-roboto commented 2 years ago

@Rudd-O thanks for creating this Pull Request and helping to improve Plone!

TL;DR: Finish pushing changes, pass all other checks, then paste a comment:

@jenkins-plone-org please run jobs

To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically.

Happy hacking!

Rudd-O commented 2 years ago

@jenkins-plone-org please run jobs

Rudd-O commented 2 years ago

Hold the merge for this, even if the tests pass. I have discovered an issue with the fix that needs to be addressed.

EDIT: false alarm. This is good.

Rudd-O commented 2 years ago

@jenkins-plone-org please run jobs

Rudd-O commented 2 years ago

@jenkins-plone-org please run jobs

Rudd-O commented 2 years ago

I have excellent news!

Another thing that was legendarily broken in Multilingual sites (exactly because of this get_top_site_from_url(...) breakage) was the media browser in the TinyMCE editor. You could not browse for files to link, embed or insert as images.

This code fixes it!

OMG I have been five years without being able to insert things on my site unless I use an admin domain without VHM... and now I can! Pure joy.

Rudd-O commented 2 years ago

@jenkins-plone-org please run jobs

flipmcf commented 2 years ago
  1. Take a look at the docstring for get_top_site_from_url() in your final commit. It looks like it's duplicated.

Questions: How did you arrive at ISite as the interface marker? I think there are more, like ISiteRoot and INavigationRoot that are also good candidates.

(already answered: am preserving the behavior of the code, just including the topmost accessible object (rather than the bottommost only) in the list of objects to be considered as "site roots" to return to the caller.)

Docs say use ISiteRoot https://docs.plone.org/develop/plone/serving/traversing.html#traversing-back-to-the-site-root ---But I don't recommend making this change here. I'd rather see another ticket specifically calling that out as a bug, if it even is one.

I don't use VHM and Multilingual, so I don't have direct experience, and I'm not sure if I can deploy this into my env and actually test it.

Will come back and edit this with more.


More: God, I hate this code. That Try/Except block is huge.... I want to re-write this.... calm down....


Rudd-O commented 2 years ago

From @flipmcf :

So, let me get this straight.

The "canonical way" (bigger than 'best practice') to get a site root is to traverse upwards until you find an object that implements ISite.

I honestly do not know :-)

You are proposing that we add a utility to do just this. And use that utility.

We have a get_top_site_from_url function in plone.base which AFAIK is may not be new code but if it isn't, it definitely used to live somewhere else before plone.base was conceived. Do not take my word for it — I have not tried this in Plone 5.

This function does not traverse from leaf to root. It traverses from root to leaf, starting with the rootmost element visible through the Web (VHM) and falling back to the actual Plone site root discovered through getSite(). Unfortunately, as of main right this very instant, and for many years now, the traversal gets to the leaf object and falls back to that, which is wrong, since @@qsOptions is not gonna work there (also, under no circumstance are most leaf objects a "top site").

This code fixes the traversal so that we traverse from root to leaf, collecting any legit ISite()s along the way (and also the rootmost visible object according to VHM), and picks the rootmost object to return (if there are none, we simply pick the actual site root as before). So the change is:

Questions:

@Rudd-O How did you arrive at ISite as the interface marker? I think there are more, like ISiteRoot and INavigationRoute that are also good candidates.

Was already there. I did not change that.

Docs say use ISiteRoot https://docs.plone.org/develop/plone/serving/traversing.html#traversing-back-to-the-site-root

I am wary of changing this.

Does this play well with zope.component.hooks.getSite? Should it? Does it even matter?

Doesn't affect getSite at all. It is only dealing with URL traversal.

I concur, you are on the right track here. Just throwing out some ideas.

I appreciate them!

Rudd-O commented 2 years ago

There really were a bunch of things that were utterly broken by the behavior of get_top_site_from_url before. This function is never used for traversal proper, but it is used all over the place to generate all those helpful JSONlets (@@qsOptions and @@fc-*) that let the client-side JS in Plone do magic like browse folders and embed media. Obviously, if this function is broken, all those helpful JSONlets become helpless.

Rudd-O commented 2 years ago

@jenkins-plone-org please run jobs

Rudd-O commented 2 years ago

The CI is dead folks. I will merge upon approval. I have the fix live running on locally and it works correctly with production VHM enabled.

flipmcf commented 2 years ago

Should do some manual regressions on affected code - especially pre-release. As you say, get_top_site_from_url is used all over the place to do UI magic.

The problem is, who is depending on the bad behavior of get_top_site_from_url and by fixing it, what breaks.

https://github.com/search?q=org%3Aplone+get_top_site_from_url&type=code

I'm feeling that the safest (but not best) change is to repair plone.app.content.browser.contents.__init__.py to use a better method of getting the site url, rather than modifying lower-level code 'get_top_site_from_url' to return something that 'folder listing' likes. This ticket is "Fix folder_contents' not 'fix get_top_site_from_url()'

If there is really a bug in the utility, then all code that depends on this utility needs to be notified, tested, and updated. This is indeed the correct path if @Rudd-O analyis is correct. I'm not able to verify that this analysis is correct (not saying it's incorrect, but that I simply can't verify)

If we change this utility, I'm confident we'll be back here with someone else saying "Hey, this behavior changed that I was relying on". Maybe that's where we want to be, distributing the 'correct' work across the dependent code through finding and fixing failures, but in that case, we must be CERTAIN that get_top_site_from_url() is broken.

I want another opinion on the hypotheses that get_top_site_from_url() is broken. --> @thet

Rudd-O commented 2 years ago

The problem is, who is depending on the bad behavior of get_top_site_from_url and by fixing it, what breaks.

Callsites

My omelette is big. There are not that many uses, and all its uses currently broken are fixed by this patch. I count two uses:

  1. plone.app.content.browser.contents:ContentsBaseAction / get_options / ContextInfo.call
  2. plone.app.widgets:get_relateditems_options

Why this must return a content object in some cases, rather than an actual site root

These two cases all need a "root object" from the perspective of the browser to compute a root for their own purposes, because all their purposes are browser stuff (fill lists of objects, chiefly). This method I'm fixing determines the "root" used for that tree which is used in related items, image insertion, and folder contents browsing, and if that "root" is incorrect / does not correspond to what the browser sees as "root", nothing works.

Unit and regression testing

All current unit tests pass now. I have added a test case to cover the "root URL object is not actually a site" case, which was never tested before. In addition to that, I have tested this with a local build (most up to date code) and here is what I can verify:

So pretty much everything that was broken before, is no longer broken thanks to this modest change. And that is the complete manual regression test of all users of this code.

Rudd-O commented 2 years ago

I've added a new test case covering this exact issue. All tests pass.

MrTango commented 2 years ago

@plone/framework-team should have a look here too i guess. Specially that nothing for restapi/Volto breaks here.

flipmcf commented 2 years ago

So pretty much everything that was broken before, is no longer broken thanks to this modest change

This is so satisfying when a low-level fix happens. Congrats! I'm still going to wait on the complaints from those who may have relied on the incorrect behavior, but the analysis seems spot-on. Who would rely on something being broken in VHM environments? (we will find out!)

@Rudd-O analysis is a good defense to anyone who finds this change offensive to them personally.

Rudd-O commented 2 years ago

A profound thanks to @flipmcf and @jensens for both the encouragement and the review. Here goes.

ale-rt commented 2 years ago

@plone/framework-team should have a look here too i guess. Specially that nothing for restapi/Volto breaks here.

Skimming the code it looks sane. @Rudd-O does this also fix https://github.com/plone/plone.subrequest/issues/17?

mauritsvanrees commented 2 years ago

This causes test failures. See Jenkins

For now, I will remove plone.base from checkouts.cfg, that should fix it. But please have a look at the affected packages.

Can't blame anyone for not seeing the test failures earlier, since PRs on Jenkins are broken currently. :-/ At least the standard jobs still work, so we see this after merge.

Rudd-O commented 2 years ago

@mauritsvanrees I am looking at the test failures:

Let me make sure these failures are because of the code, and fix the tests if they need fixing.

Rudd-O commented 2 years ago

I read the tests and they (edit) do seem to be related to my fix.

How can I run only the offending tests locally? I am using buildout and I want to bisect this.

Rudd-O commented 2 years ago

Figured it out thanks to @thet ! Please stand by for fixes...

This package really should have CI.

Rudd-O commented 2 years ago

@mauritsvanrees I have replicated the test environment locally with my code, but basing my tests on plone.base version 1.0.0b2, and my code change passes tests.

The problem began appearing with 1.0.0b3 and my code.

Will continue the bisect here.

EDIT: somehow when I use mainline my fix does cause issues. So weird.

Rudd-O commented 2 years ago

I will work on this throughout the weekend. Please don't revert this fix just yet.

Rudd-O commented 2 years ago

I think I have a fix. Stand by for patch. Validating with the other test failures now.

Rudd-O commented 2 years ago

Broken tests: https://jenkins.plone.org/job/plone-6.0-python-3.10/763/

Fix: https://github.com/plone/plone.base/pull/19

The tests that were broken were retested locally using my own buildout, and they passed.

thet commented 2 years ago

@Rudd-O great job! that must have been some deep diving. your analysis about how get_top_site_from_url is totally correct.

Rudd-O commented 2 years ago

Thank you but I still broke the tests, and that is because I could not run them locally (I am now setup to run them here, so it's fine). A fix is incoming in #20 .