silverstripe / silverstripe-cms

Silverstripe CMS - this is a module for Silverstripe Framework rather than a standalone app. Use https://github.com/silverstripe/silverstripe-installer/ to set this up.
http://silverstripe.org/
BSD 3-Clause "New" or "Revised" License
515 stars 333 forks source link

DependentPages should use a DataList, not an ArrayList #2327

Open robbieaverill opened 5 years ago

robbieaverill commented 5 years ago

Version: SS 4.3.x-dev

The fact that SiteTree::DependentPages() generates an ArrayList means that it's executed whenever it runs. This is a major problem for high volume websites, particularly because this is called every time you call getCMSFields() on a page.

I really think we need to modify this functionality so it can return a lazily executed DataList instead.

robbieaverill commented 5 years ago

I'm going to try out a POC

robbieaverill commented 5 years ago

Another option is that we could add some config to disable back link tracking and dependent pages entirely. Large data sets make the CMS completely unusable by processing all of this data on every page load (in the CMS).

robbieaverill commented 5 years ago

This seems to be working OK: https://gist.github.com/robbieaverill/9a185078f1883daf30ff039e00c504b4 - we lose the identifier string for the type of relationship, but that could be added back in in the GridField that renders this list if we want to (by checking the class type)

robbieaverill commented 5 years ago

ping @silverstripe/core-team for feedback on whether you think it'd be acceptable to change this in 4.3 or not

ScopeyNZ commented 5 years ago

Isn't this an issue because back links aren't necessarily pages? I think that's why it's using the DependentLinkType attribute right?

sminnee commented 5 years ago

I think that that this would be okay but really, the lesson here is that our public API should have been declared to be SS_List not ArrayList. We need to be much more careful about not being overly restrictive in how we declare our public API.

This also applies to, for example, the choice of private vs protected vs public members.

So the typing hinting should be changed to be SS_List, not DataList.

robbieaverill commented 5 years ago

Isn't this an issue because back links aren't necessarily pages? I think that's why it's using the DependentLinkType attribute right?

If that's the case then I guess we need to follow up on the todo in BackLinkTracking to implement a polymorphic many many list

maxime-rainville commented 5 years ago

I've explored re-factoring DependentPage while looking at https://github.com/silverstripe/silverstripe-cms/issues/2276#issuecomment-426144450

That was my thought at the time. We decided to put that on the back burner because of time pressure, but I think all of my point are still relevant ... except maybe the "put your head in the sand" bit.

Problems with Dependent Pages logic

I've been digging on how SiteTreeLink and SiteTree::DependentPage have been implemented.

SiteTreeLink has a Polymorphic Parent has-one relation and a Linked has-one relation to SiteTree.SiteTree::DependentPageloops through all theSiteTreeLink.Linked` for the current page, then all the VirtualPage, then all the RedirectorPage ... then it sticks everything in an ArrayList.

There's some somewhat weird logic there that leads to weird behavior.

  • The DependentPage GridField edit links is broken because it assume every entry will be a SiteTree.
  • The Same entity can appear in there multiple times when using content blocks.
  • DependentPage doesn't appear to be confident its SiteTreeLinks exist because it manually loop through the list to make sure.
  • Sorting the GridField is painfully slow because it's using an ArrayList ... I came across an example where this would take 2.8 minutes in our key project. There's no way to effectively sort by URL because that value is computed in the application.

Possible solutions

Disable AbsoluteLink

Whatever we do, there's no way of sorting a list of SiteTreeLink by AbsoluteLink without loading all of the Parent objects and computing their AbsoluteLink in the application layer.

Refactor SiteTreeLink

I would ditch the polymorphic relation on SiteTreeLink. Any DataObjects that wants to say that it dependents on SiteTree for something would need to add its owner page to the SiteTreeLink relation (as opposed to itself). That would include RedirectorPage and VirtualPage. Might need to include some sort of qualifier column here. e.g: Redirect, Content block, Content Link, Virtual Page.

SiteTree would need some sort of hook or method that can call its children to ask them if they have a dependencies on certain pages (there's already something similar there). When saving a content block, it will probably need to trigger that hook on the parent page. We would also need to cascade delete SiteTreeLink when a SiteTree is archived.

Refactor SiteTreeLink and move the polymorphic relation to a many-many relation

That's similar to the previous solution, however we track each individual object attached to our SiteTreeLink via a polymorphic many-many relation.

e.g.:

SiteTreeLink:
  Linked: Page A
  Parent: Page B
  DirectDependents:
    - Page B itself beause it has a link in its WYSIWYG to Page A.
    - ContentBlock B that is own by Page B because it has a WYSIWYG link to Page A.

Each possible DirectDependents is responsible for creating the SiteTreeLink if necessary and adding/removing a reference to itself from DirectDependents. We'll know a SiteTreeLink is no longer needed when all its direct dependents are gone.

This "put your head in the sand" non-solution

The two previous solutions imply some API change that will probably break semver. They also imply having some sort of migration task to update the existing data.

I don't think there's any way to keep the current functionality, make it performant and not break API.

The best thing I can think of is remove AbsoluteLink from the GridField or save it on SiteTreeLink (which would involved keeping it up-to-date somehow, which would not be very consistent).

We could add some sort of AJAX action that only fetch the link on click somehow, if we want to allow CMS authors to access the AbsoluteLink.

michalkleiner commented 5 years ago

Hi @chillu, should @maxime-rainville's points, mainly the

The DependentPage GridField edit links is broken because it assume every entry will be a SiteTree.

be extracted into a separate issue? Or changing the type of this issue to a bug?

That specific thing is a bug causing 500 error anytime a CMS user visits the Dependent pages tab in SS4 where a page is linked from a content block (as it's trying to fetch a subsite details for blocks that are not using the subsite extension — default CWP recipe).

maxime-rainville commented 5 years ago

Not sure I came across the subsite issue before. That's going to be a different problem.

Do you want to raise a separate issue.

chillu commented 5 years ago

@michalkleiner Yep separate issue for broken edit links please

intotheweb101 commented 4 years ago

Hello @robbieaverill

Getting this issue on CWP now, was there a solution to this?

Thanks

robbieaverill commented 4 years ago

Not as of yet, as far as I know

michalkleiner commented 4 years ago

This got me again on a CWP project, grrr.

New issue for the subsites problem: https://github.com/silverstripe/silverstripe-cms/issues/2554