Closed mfendeksilverstripe closed 6 years ago
The problem doesn't appear to be related to building the ArrayList
. That's relatively quick to build.
I've been looking at our beloved key project. At least for one case I'm looking at, the problem appears to be getting the AbsoluteLink
of content blocks, which involves one query to fetch the Elemental Area and another query to fetch the Parent Page and probably more based on what hooks into updateLink
.
The Page I'm looking at has 23 dependent content blocks. It's adding 12 sec to my page load locally.
make this feature show on demand, not during edit for the page load
This would be nice, but I'm not sure how feasible it would be to implement
@maxime-rainville As discussed, please propose a longer term solution to this which doesn't put this auxiliary feature on the critical path of loading the edit form. I'd guess that less than 1% of edit form loads are done with the user goal of viewing dependant pages, so they shouldn't slow down every one of those loads.
@clarkepaul Do you have ideas on how we could lazy load? The most obvious way would be to load them when opening the tab, right? Note that we'd need to make this work in both React and Entwine sections, since all kinds of records can have "dependant things" now.
A somewhat left field idea could be to move it to it's own dedicated CMSMain
controller, a bit like CMSPageSettingsController
.
This would move it on equal footing with the Content
, Settings
and History
tabs. You could make a straight face argument that this actually better UX: Dependent Pages are not part of the SiteTree content, so why should they be on the Content
tab? It looks more like metadata about the page, that is not directly editable. (He said keeping a straight face)
We could try to update the basic form logic so form fields are only loaded/rendered when tabs are access. This could have across the board benefits for all tabbed forms, but could be moderately difficult to implement.
It would involved updating the form save logic to not assume that all form fields will always be present. Probably not reasonable to expect this to be done in time for the 4.3 release.
We could refuse to preload the GridField data. Then when you access a tab with a GridField, we use the regular GridField AJAX logic to load the first page.
I guess it would involve relatively simple PHP/template tweaks and some JS hackery. But probably more achievable short term than Make all form tabs lazy-loadable.
We could try to update the basic form logic so form fields are only loaded/rendered when tabs are access. This could have across the board benefits for all tabbed forms, but could be moderately difficult to implement.
Yeah, I don't see how we could make this work with the getCMSFields()
and form schema APIs at the moment. Lazy loading within a single field is possible (e.g. we already do that in TreeDropdownField
), but we can't omit those fields from Form->Fields()
to begin with. Which makes it hard to create a generic "lazy tab" solution.
Don't preload GridField's first page
We're likely moving into this direction through React GridField anyway, although this is a more immediate problem to solve - we can't wait for that whole React refactor to go through until we fix performance issues on large sites. Lazy loading of GridFields on tabs other than the first one is actually an interesting idea though. GridField already has XHR logic to load data (and a "JS API" to trigger those). We would just need to indicate this behaviour via GridField->setLazy(true)
or something, and then listen to tab changes. There'll be some edge cases like when GridFields aren't embedded in tabs. @maxime-rainville Could you please sketch out an API RFC in a separate card for this, addressing some of the edge cases you can think of?
Make Dependent Page render on its own controller
That just moves the problem around. A "settings" tab also shouldn't take a dozen seconds to load, as a CMS author you might just want to change permissions there, and are not interested in the dependant pages. It's less impactful than having these load times on the primary interface (edit form), but not by much ;)
I've been digging around the GridField code. Here's what I think would be needed to accomplish this.
getLazyLoad()
and setLazyLoad()
to GridField
.GridField
and/or the GridField
template to not render the body of the GridField when LazyLoad
is true
and add a gridfield-lazy-loadable
class.GridField
to force full rendering when access via its index
action.TabSet.js
to find all LazyLoadable GridField in a Panel when that Panel is shown and trigger the reload
method on it. Each GridField should be marked as loaded to make sure we don't reload them over and over again if the user keeps switching tabs.DependentPages
or do we want to be more aggressive e.g.: turn it on by default on the FormScaffolder
.I've been digging on how SiteTreeLink
and SiteTree::DependentPage
have been implemented.
SiteTreeLink
has a Polymorphic Parent
has-one relation and a Linked
has-one relation to SiteTree.
SiteTree::DependentPageloops through all the
SiteTreeLink.Linked` for the current page, then all the VirtualPage, then all the RedirectorPage ... then it sticks everything in an ArrayList.
There's some somewhat weird logic there that leads to weird behavior.
Whatever we do, there's no way of sorting a list of SiteTreeLink by AbsoluteLink without loading all of the Parent objects and computing their AbsoluteLink in the application layer.
I would ditch the polymorphic relation on SiteTreeLink. Any DataObjects that wants to say that it dependents on SiteTree for something would need to add its owner page to the SiteTreeLink relation (as opposed to itself). That would include RedirectorPage
and VirtualPage
. Might need to include some sort of qualifier column here. e.g: Redirect, Content block, Content Link, Virtual Page.
SiteTree would need some sort of hook or method that can call its children to ask them if they have a dependencies on certain pages (there's already something similar there). When saving a content block, it will probably need to trigger that hook on the parent page. We would also need to cascade delete SiteTreeLink when a SiteTree is archived.
That's similar to the previous solution, however we track each individual object attached to our SiteTreeLink via a polymorphic many-many relation.
e.g.:
SiteTreeLink:
Linked: Page A
Parent: Page B
DirectDependents:
- Page B itself beause it has a link in its WYSIWYG to Page A.
- ContentBlock B that is own by Page B because it has a WYSIWYG link to Page A.
Each possible DirectDependents is responsible for creating the SiteTreeLink if necessary and adding/removing a reference to itself from DirectDependents. We'll know a SiteTreeLink is no longer needed when all its direct dependents are gone.
The two previous solutions imply some API change that will probably break semver. They also imply having some sort of migration task to update the existing data.
I don't think there's any way to keep the current functionality, make it performant and not break API.
The best thing I can think of is remove AbsoluteLink from the GridField or save it on SiteTreeLink (which would involved keeping it up-to-date somehow, which would not be very consistent).
We could add some sort of AJAX action that only fetch the link on click somehow, if we want to allow CMS authors to access the AbsoluteLink.
Just realised most of the dependent content blocks for our key project are actually orphan. Adding some logic to purge orphan content blocks from the SiteTreeLink
table, might be enough to get our key project over its performance hump.
However that doesn't address the underlying issue.
Add getLazyLoad() and setLazyLoad() to GridField.
Works for me, but please point out in the PHPDoc that this method might no longer take effect in future GridField implementations (when lazy loading could be the default), just to set dev expectations a bit.
Update TabSet.js to find all LazyLoadable GridField in a Panel when that Panel is shown and trigger the reload method on it.
Do we have any cases in the CMS where fields aren't in a tabset? Make sure that still works. Oh, and it'll need to work in Entwine and React based tabs, right?
Disable sorting by AbsoluteLink
Agreed, that's not a useful sorting option anyway.
Refactor SiteTreeLink and move the polymorphic relation to a many-many relation
Looking at PolymorphicHasManyList.php
, the fact that it's polymorphic isn't the issue. That just adds a ClassName IN (...)
to the SQL query. The problem is that SiteTree->BacklinkTracking()
iterates on the SiteTree->Backlinks()
relationship to get the Parent()
for each of them. Since Parent()
is of type DataObject
, we can't run a single query for this through the ORM. Which means we have no way of doing pagination, and need to retrieve the whole list. Note that the original commit from @tractorcow referenced "todo: Add PolymorphicManyManyList" (https://github.com/silverstripe/silverstripe-cms/commit/6c616f5f7a4d2611e7a8f2cbb66360079c3edb6b#diff-49ee3b23d5b37c83e4a70cddc70f89adR1723). I don't see how that would resolve the ORM issues, unless we do some magic with UNION
and renaming columns to avoid collisions of querying multiple types in a single query.
We could inline relevant data for displaying the "related objects" GridField into SiteTreeLink
, specifically the title to display in the table row. But then we push more responsibilities into write()
. And we'd still have the ArrayList
problem further up the callstack, because SiteTree->BacklinkTracking()
combines three different data sources (backlinks, virtual pages and redirector pages) into one table - coming back to the same ORM limitation mentioned above.
I'm coming to the conclusion that this is too hard to resolve on the short term, and we need to come up with a more scalable way of handling nested relationships for different use cases: Static publishing dependencies, link tracking, search, etc.. Related tickets: https://github.com/silverstripe/silverstripe-versioned/issues/68 and https://github.com/dnadesign/silverstripe-elemental/issues/189#issuecomment-368710277
Until then, it sounds like the lazy loaded GridField is a good middle ground? It seems fairly quick to implement. Large dependant link tracking structures will still time out, but at least it doesn't prevent you from editing pages. And it has the advantage that it takes the entire computation out of the edit form critical path, which I think is a good UX practice anyway in terms of building a fast CMS.
Version:
4.2.1
(c611c981e45409c8ab318f709a5d94cdd7e0ce84
)SiteTree::DependentPages()
function creates anArrayList
of all dependant pages.If you have a root page with 11 000 dependant pages the performance takes a big hit. It doesn't help that having this feature is mandatory and you can't remove this by configuration.
My recommendation is:
ArrayList
unless there is no other choiceLazyLoading Pull Requests