in2code-de / in2publish_core

in2publish Community Version
https://www.in2code.de/produkte/content-publisher/
GNU General Public License v3.0
40 stars 23 forks source link

v11 BUG: RTE parser too permissive for old file links syntax #111

Closed YKWeyer closed 11 months ago

YKWeyer commented 11 months ago

We discovered than using file:(int) in RTE fields caused the ContentPublisher to add an association to the file with the id, even if it was just within a regular text, or as a part of an external URL.

For example, entering the following text in a RTE field would create two relations to file 123 and 456 (whether they exist or not 🫤) although there is no need for it:

<p>profile:123</p>
<p><a href="https://www.google.com/search?hl=en&q=profile:456">Google search for profile</a></p>

It seems the DefaultRecordFinder::fetchRelatedRecordsByRte is too permissive on detecting old typo3 link syntax:

https://github.com/in2code-de/in2publish_core/blob/b27e5ad7016ab7e3ff0c0cd0caac0fbb61a2165f/Classes/Component/RecordHandling/DefaultRecordFinder.php#L651-L665

Maybe this method should use HtmlParser::splitIntoBlock to only target href attributes of <a> tags? (Or should other tags such as link also be considered?). Happy to open a pull request for it if you see fit.

ℹ️ At least, this doesn't seem to be the case in v12 🎉

vertexvaar commented 11 months ago

I checked in every TYPO3 version back to v10 how TYPO3 stores links. It is always a t3 URN. (t3://<classification>=<identifier>) I think we can drop the support for non-t3-URNs completely in TYPO3 v11. Editing content with legacy file URIs will convert them to the new t3 URNs. @YKWeyer can you please check if there are still links with the legacy syntax used in your system?

YKWeyer commented 11 months ago

I already removed all legacy syntax from our system. In fact, all occurrences were links pointing to an old TYPO3 instance (< 8 LTS) that is now gone.

I think we can drop the support for non-t3-URNs completely in TYPO3 v11. Editing content with legacy file URIs will convert them to the new t3 URNs.

I don't know how many instances might still use the old syntax though. They would have had to upgrade from an old version (< 8.3) to the most recent ones without editing their content in between. It's probably unlikely, but the fact that the TYPO3 core still contains the logic to translate this old syntax more than 4 major releases after might be a hint that they suspect it might affect more instances that we think. (Either that, or they simply forgot about it)

But I would assume it's fair to remove support for legacy in the ContentPublisher, as long as this is announced as a potentially breaking change for older instances that might still have them somewhere…

vertexvaar commented 11 months ago

Thank you for your research, it has shed some light on the old file link syntax for me. It's clear that this kind of link will only ocurr in an anchor tag href attribute. Therefore i can change the regex accordingly. Dropping the support is not possible since we promise to support everything that works in TYPO3 (excluding 3rd party extensions obviously). Bugfix incoming!