<draft> PLIP: move resolveuid generation logic to plone.app.linkintegrity

Rudd-O commented 2 years ago

PLIP (Plone Improvement Proposal)

Esteemed @plone/framework-team - I am submitting this PLIP for your consideration.

Responsible person

Proposer: @Rudd-O

Seconder: still needed

Abstract

This PLIP aims to improve both Plone's rich media story as well as link integrity, by centralizing the generation of resolveuid URLs which aid in the robustness of Plone content. Both Plone Classic and Volto users will benefit, transparently, without any codebase changes needed higher in the stack.

TL;DR: we propose to implement resolveuidification — the counterpart of portal_transforms' resolveuid to URL.

The scope is limited strictly to the intersection between Plone 6 and text/html Rich Text fields in Dexterity content types.

Overview, context and background

Plone has a very robust, best-in-class link integrity feature that couples three disparate aspects to achieve strong robustness with hyperlinks to content:

A mechanism to track objects and references, implemented consistently in plone.app.linkintegrity, which leverages the resolveuid machinery. This mechanism currently tracks a href, img/video/audio src, and srcset.
A mechanism to reformat links to internal content to resolveuid URLs, implemented inconsistently in various parts (TinyMCE, collective.exportimport).
A mechanism to render resolveuid URLs to the correct link to the objects referred to by the resolveuid URLs, in relation to a content context. This mechanism relies on the portal_transforms machinery.

This PLIP tackles the matter number 2 in the list above — the way that resolveuid URLs are generated is not consistent / has "holes". For example, manually-entered hyperlinks or image references are not resolved to resolveuid URLs — only TinyMCE-created ones (via TinyMCE GUI plugins) are. Furthermore, other types of linkable content objects, such as IFRAMEs and VIDEO / AUDIO / srcset= links, are also not rendered as resolveuid URLs.

Aside: there exist products like collective.wildcardmedia which abuse a hrefs to embed audio / video, in order to address this functionality gap. This sort of handling of the issue offers the Plone user considerably less control of the final markup used to render their embedded media content, relies on client-side JS to have any effect at all, and does not cover embeddable content like PDFs.

Given the many ways in which users may add content to their Plone sites (via TinyMCE, or imports, or raw HTML, or Volto, or even Python scripts), it makes sense to centralize the resolveuid generation in a single place that applies to all of them, and then rip out the other code where these links are generated — to reduce our maintenance burden.

It is furthermore — at least as of today — arguably suboptimal layering to have this functionality in a TinyMCE Plone pattern, hidden in various layers of indirection that need to be reimplemented by each content type being inserted (link, image, and the as-of-now-still-unable-to-resolveuid media insertion plugin). It would also be suboptimal if resolveuidification had to be implemented in Volto itself — it would require exposing more of the Plone API, and multiple roundtrips (at least one per hyperlink) to resolveuidify content that will be saved by Volto.

Concrete implementation details proposed

Note: All of these items will be gated to the linkintegrity registry setting, as the preexisting functionality already is.

Add an IResolveUIDer interface (name to be revised) that complements the IRetriever interface, to plone.app.linkintegrity. IRetriever retrieves links for linkintegrity so linkintegrity can track them. IResolveUIDer would transmogrify links to resolveuid whenever possible.
- Some consideration required: TinyMCE currently adds data-val=<UID> attributes with the UID of the object, which is unknown to me what they are used for (perhaps to reflexively help TinyMCE later?). Perhaps IResolveUIDer would add those too, for completeness. It appears that these attributes do survive the roundtrip when the /edit view is invoked?
Implement a IResolveUIDer adapter for Dexterity.
- Good to know: the code to do this already exists in tested form within https://github.com/collective/collective.exportimport view @@fix_html -- I have been using it regularly to ensure that my new content's links is rewritten in resolveuid form, and it works great (but it is annoying to use a view from a non-core add-on manually every time prior to publishing).
Wire the IResolveUIDer so that the transmogrification takes place before the link tracking done by IRetriever, modifying content as it comes into Plone.
- Good to know: plone.app.linkintegrity already hooks into the object modified machinery, and has the right hooks to know if and when this feature should be enabled (tied to the linkintegrity setting).
- Good to know: once this is implemented, any code that saves Rich Text text/html fields benefits from automatic resolveuidification. This includes both Volto and TinyMCE.

Stretch goal:

In time, deprecate and remove the TinyMCE pattern code that currently does resolveuid resolution altogether, as well as knowledge of the linkintegrity setting.

Out of scope (currently) but would be real nice to have:

Implement resolveuid URL rewriting for Markdown or other intelligent quasi-plain text formats.
Change the TinyMCE media insertion plugin to add an Upload tab or other affordances like size profiles, as the image insertion plugin has.
Implement "migration" of existing content to resolveuidify it.
- Users can use collective.exportimport if they do positively want to apply link integrity to existing content, as the tool already has this baked in.
- Of course, nothing prevents future work porting that c.ei feature into Plone core, either as a CP configlet or as a migration step — but any of that would be out of scope for this project.
- For future implementors of this out-of-scope goal: maybe it can be a push button next to the existing "enable link integrity" checkbox.

Risks

This increases the scope of plone.app.linkintegrity — it is currently limited to only tracking link integrity and helping enforce it, rather than policy decisions on content rewriting. We would add an additional responsibility to do inbound content rewrite of internal links. Ideally this would be done as an input filter, but Plone does not appear to have a meaningful place to add this, or to order it with respect to the triggering of plone.app.linkintegrity hooks.
It is possible that a preexisting bad UID map will cause a correct inbound URL to be transmogrified into a resolveuid URL that points to the wrong content object.
I expect that the amount of tests which will be impacted by this change is larger than the usual PLIP change.
If we decide to back this change out in the future, that would be perfectly possible. Plone already has included — for a very long time — the capability to forward-resolve from UID to URL, so any users affected by this change being backed out would not experience any content breakage at all.

Funding

If we can get enough consensus on the implementation details, I am willing to fund the development of this PLIP till completion. The reason I propose to fund core development is that I am likely not capable enough to implement this by myself.

jensens commented 2 years ago

To me the natural place for all high level uuid handling code would be plone.app.uuid - linkintegrity is only one aspect of uuid handling. Can you elaborate why you choose that package?

https://github.com/plone/plone.app.uuid

Rudd-O commented 2 years ago

Can you elaborate why you choose that package?

Because plone.app.linkintegrity already is well-positioned to process the modifications — it already implements the proper object modification inbound hooks, and it deals with HTML markup already — whereas plone.app.uuid does not appear to do so: it only offers the machinery to index and resolve UUIDs forward, and crucially doesn't know anything about content or markup.

We could also implement this proposal straight in another package (hypothetically plone.app.linkuidify) but then we'd have to figure out a way to order the modification hooks before plone.app.linkintegrity so the linkintegrity machinery receives the post-uidified URLs in its content modified hooks — IOW ordering of event processing becomes an issue, which happens to be solved for the output pipeline, but (to my knowledge) not for the input one. linkintegrity would probably depend on this new package then.

We could also possibly implement this as a Plone transform (it is extremely unclear to me how we'd do it). We'd still have to care about the ordering problem, though.

Context for plone.app.uuid:

MrTango commented 2 years ago

In general +1 for the direction, but it is quite some work, not sure what milestone you have in mind, for Plone 6.0 it looks like a bit too much. But if it is non breaking can be done later too. I think plone.app.linkintegrity is the better place, at the end it's about link integrity. Please no addition package ;). Would you work on this, when it's accepted by the FWT?

Rudd-O commented 2 years ago

This would be a non-breaking change. Even if we were to back the change out, Plone already has included — for a very long time — the capability to forward-resolve from UID to URL, so any users affected by this change being backed out would not experience any content breakage at all.

It's not that much work — the meat of the code already exists in c.ei (ask @pbauer — he wrote most of it!), and I estimate wiring the existing code should clock in at less than 500 LOC, probably substantially less. Here is what we have, which I have already made sure is consistent with resolveuid handling throughout the rest of Plone:

Even regarding linkintegrification of existing content, at no point in time will we ask future Plone users to do any sort of migratory step if they dont want to, but we can kindly point them to collective.exportimport if they do positively want to apply link integrity to existing content, as the tool already has this baked in. Of course, future work may decide to port that feature into Plone core — but that would be out of scope for this project.

I would most likely not work full time on this, but I have a substantial sum of money (EUR / CHF / USD / BTC offered) set aside to fund this project, to whomever in the Plone community wants to tackle this. I will also be reviewing and likely making minor changes to the code as it evolves.

And of course the result will be as free software as Plone itself.

tisto commented 2 years ago

@Rudd-O can you elaborate your plans regarding Plone 6 and especially Volto? Both link integrity and resolveuid might be subject to change in the future. We have to make both features fully available via REST API and usable for any consumer system (mainly but not only Volto). We will also have to think about making all TinyMCE transformations optional, since there is no TinyMCE in Plone 6 / Volto.

For Volto we are currently in the process of moving from DraftJS to Slate editor. When this is done we plan to approach the linkintegrity use case (resolveuid works already). It does not make much sense to put effort into linkintegrity for DraftJS when we move to Slate before Plone 6. Those efforts might align nicely with this PLIP. Though, I would be strongly against a large refactoring of code that does not take the Plone 6 / Volto use case into consideration.

I also agree with @MrTango that this might be a big change and it is too late for Plone 6.

mauritsvanrees commented 2 years ago

Implement two IResolveUIDer adapters, one for ATCT and one for Dexterity.

Forget about the ATCT one. That is Archetypes, so only Plone 5.2 on Python 2.7. That is not something I as release manager would want such a big change in: too risky for a Plone version that should remain stable.

Rudd-O commented 2 years ago

Archetypes

Fixed by removing that part from the proposal.

Rudd-O commented 2 years ago

@Rudd-O can you elaborate your plans regarding Plone 6 and especially Volto? Both link integrity and resolveuid might be subject to change in the future. We have to make both features fully available via REST API and usable for any consumer system (mainly but not only Volto). We will also have to think about making all TinyMCE transformations optional, since there is no TinyMCE in Plone 6 / Volto.

In short: Volto will not be impacted — however, Volto developers may be able to save a lot of hard-to-get-right labor in the future.

The proposal involves only embedding the resolveuidification (currently done by TinyMCE) into Plone core backend.

From Volto's perspective, Volto will continue to POST content to be saved to Plone, just as it does today. All that would happen is that Plone would, in the background, change internal references to be resolveuidified.

For Volto we are currently in the process of moving from DraftJS to Slate editor. When this is done we plan to approach the linkintegrity use case (resolveuid works already). It does not make much sense to put effort into linkintegrity for DraftJS when we move to Slate before Plone 6. Those efforts might align nicely with this PLIP. Though, I would be strongly against a large refactoring of code that does not take the Plone 6 / Volto use case into consideration.

That's great news! It means that the Slate editor would also automatically benefit from this work.

If this proposal is implemented, people working on the Volto side will have to do no effort whatsoever to benefit from automatic resolveuidification. Every RichText content saved by Volto will automatically be resolveuidified.

There will be no large refactoring of any part of any codebase as part of this proposal. No API surfaces will change.

sneridagh commented 2 years ago

@Rudd-O My five cents!

Since some time ago, we have the "Blocks Transforms" for serializers/deserializers in plone.restapi.

https://plonerestapi.readthedocs.io/en/latest/blocks.html?highlight=transform#block-serializers

It is a powerful tool that we developed not just for solving resolveuid subject in Volto blocks, but for other purposes (indexing, injecting data, etc). Basically, it's a ZCA-only powered portal_transforms-like tool. I think we should consider to make this idea backend wide, not only for Volto blocks, so it can be reused everywhere we need to transform data back and forth. At some point we started toying with the idea to apply the blocks transforms to content types and fields themselves.

We need to synchronise efforts so we do not end up with two tools/ways to achieve the same thing, or worse, yet another portal_transform tool.

Some pointers on how resolveuid is solved with blocks transforms:

ResolveUID deserializer https://github.com/plone/plone.restapi/blob/master/src/plone/restapi/deserializer/blocks.py#L91

ResolveUID serializer https://github.com/plone/plone.restapi/blob/master/src/plone/restapi/serializer/blocks.py#L79

I can imagine that the way to generate/resolve the resolveuid using unified interfaces and entry points could (and should be improved) so I'm looking forward to this PLIP.

If I may suggest a couple of things, first one being, that instead of spreading core features in satellite packages, we either start concentrating them into the core packages. We talked a bit about this with @jensens during the Plone Conference at Sorrento and it feels to me that everybody agrees on that.

Finally, I think that Plone 6 frontend should be included in the "Implementation Proposal" section so we make sure that we do not implement (and merge) a PLIP that only works in classic and we work on PLIPs that are stack-wide aware and provide vertical solutions rather than horizontal ones.

Rudd-O commented 2 years ago

@sneridagh It looks like the uid_to_url and url_to_uid functions need to be updated to deal with whole tags rather than single URLs (as URL processing in some cases requires "multiple URLs" in a single URL, witness the case of source srcset= and friends). The code in collective.exportimport does this already.

I still think the most vertical solution is to rely on the PortalTransforms machinery rather than writing ad-hoc ser/deser code, which is (to my limited understanding) what the code you linked currently does. If that code used PortalTransforms from the start, it would benefit from this PLIP out of the box.

EDIT: I'm specifically talking about https://github.com/plone/plone.restapi/blob/master/src/plone/restapi/serializer/blocks.py#L89 which is code that does (less of) what https://github.com/collective/collective.exportimport/blob/af403e2c21cee549f40937bb3ce6f277cc7212be/src/collective/exportimport/fix_html.py#L92 does. If the plan above materializes, and those blocks use the PortalTransforms machinery — as they should, if they are rich text HTML fields — then you don't need any of this code and it can all safely be deleted.

sneridagh commented 2 years ago

@sneridagh It looks like the uid_to_url and url_to_uid functions need to be updated to deal with whole tags rather than single URLs (as URL processing in some cases requires "multiple URLs" in a single URL, witness the case of source srcset= and friends). The code in collective.exportimport does this already.

We do not deal with tags... nor with HTML as a whole, we deal with JSON. Slate saves to JSON. We deal with tags only on rendering, since tags are only rendered when they should be, in JSX. Which is the whole point. That code runs on every URL that it finds given a specific block.

I still think the most vertical solution is to rely on the PortalTransforms machinery rather than writing ad-hoc ser/deser code, which is (to my limited understanding) what the code you linked currently does. If that code used PortalTransforms from the start, it would benefit from this PLIP out of the box.

We implemented the blocks transforms because a reason, and they provide functionality beyond what PortalTransforms provide right now. It's not add-hoc, it's the evolution of PortalTransforms.

EDIT: I'm specifically talking about https://github.com/plone/plone.restapi/blob/master/src/plone/restapi/serializer/blocks.py#L89 which is code that does (less of) what https://github.com/collective/collective.exportimport/blob/af403e2c21cee549f40937bb3ce6f277cc7212be/src/collective/exportimport/fix_html.py#L92 does. If the plan above materializes, and those blocks use the PortalTransforms machinery — as they should, if they are rich text HTML fields — then you don't need any of this code and it can all safely be deleted.

No it's not. that one gets HTML (via it's soup representation) and fixes it (as a whole). Where that piece of code is used? and for what purpose? Again, we can't get rid of the Blocks transforms because the PortalTransforms are not that flexible.

As said, I think that the blocks transforms serializers/deserializers idea and features is great (not my making, @tiberiuichim and @cekk to blame) and I really think that it's quite a powerful one, and one that could be bended even more to work with fields (and specifically with rich text fields).

Please, @plone/framework-team due to the verticality of this PLIP, whenever is going to be discussed, it would be great if you please invite the @plone/volto-team to discuss it.

Rudd-O commented 2 years ago

OK @sneridagh, I may have misunderstood — I thought these blocks dealt with HTML rich text fields.

This PLIP is strictly limited to dealing with uniqueid'ifying HTML rich text fields as they are fed into Plone for storage. It does not influence the rendering of any data sources, JSX or HTML — within the scope of this PLIP, that remains untouched and the same.

So it's really orthogonal work which will not influence Volto in any way, and will benefit Classic. Of course, more eyes are welcome.

jensens commented 2 years ago

I still support the basic idea (details need discussion), but I veto against adding IResolveUIDer interface and implementation to p.a.linkinterity - it's place is plone.app.uuid. It is all out of scope of what I consider a linkintegrity checker package to do.

jensens commented 2 years ago

Overall I would like to see an implementation scanning arbitrary (not limited to HTML) text for links, a detection if they are internal and not already based on UUIDs, then replace them by matching UUID links. (and vice versa). This basic/generic and robust/very well tested implementation then can be used in several places, like for JSON/slate, for HTML/TinyMCE, at export/import time and so on.

Rudd-O commented 2 years ago

plone.app.uuid seems to me lower in the dependency stack than plone.app.linkintegrity — why should plone.app.uid gain HTML parsing functionality as part of this project, when plone.app.linkintegrity is where that sort of thing is happening nowadays?

Alternatively, I can create a separate package. Either way, plone.app.linkintegrity is going to use code from that hypothetical new package, and plone.app.uuid won't.

@jensens your view?

Rudd-O commented 2 years ago

(No disagreement on the second comment, albeit I have no plans for doing this to the Markdown interfaces — there is no precedent for Plone to do that kind of transformation.)

jensens commented 2 years ago

My POV:

a) no new packages! b) Place code where people would expect it. I do not expect IResolveUID in something called linkintegrity, getting my point? c) pa.linkintegrity can depend on pa.uuid - why not.

Please sketch

which new code will be created and
where existing code will be (re)used.
Where are relevant dependencies between packages and
if there are new ones created?

(No disagreement on the second comment, albeit I have no plans for doing this to the Markdown interfaces — there is no precedent for Plone to do that kind of transformation.)

Well, if you break it down its about finding UUID-links in text, replacing them with full URL links. It does not matter much what the text is semantically.

Rudd-O commented 2 years ago

Well, if you break it down its about finding UUID-links in text, replacing them with full URL links. It does not matter much what the text is semantically.

Not at all. @jensens I get the vibe that you believe the purpose of this PLIP is the antipode of what it actually is.

This PR does not touch the part of Plone that renders UUID links into proper HTML URLs. That will remain untouched.

The purpose of this PR is the opposite — evaluate HTML URLs and see if they can be transformed to UUID links, then transform them if possible.

My plan so far involves:

Importing code from c.exportimport that already does this, placing it here. This is 100% code reuse, and the code is well-tested now.
Wiring the code up so that, upon save, incoming HTML text is scanned for hyperlinks to Plone content, and transformed to resolveuid links. I am fuzzy right now on exactly where this would be wired up, because I haven't read the code in a few months, but I can spend some time warming this part of my brain up again.

This modification would be preferably happening to p.a.linkintegrity because the purpose of this PR is to help preserve link integrity, and that's also what the purpose of the link integrity package is — to preserve link integrity.

No new packages!

jensens commented 2 years ago

Anyway, reverse lookup would fit in plone.app.uuid as well. However, you need to get the PLIP out of draft mode and the official path over the framework team (I am not part of any longer) applies, who will then review it.

tisto commented 3 months ago

No activity since 2022. Closing this PLIP. Feel free to re-open if anyone disagrees.

jensens commented 3 months ago

No activity since 2022. Closing this PLIP. Feel free to re-open if anyone disagrees.

Still open.

I close this for now. If you plan to work further on this please reopen and go on.

plone / Products.CMFPlone