If this is a code change, please include a summary of what you've coded, and link to the issue(s) it closes/implements.
If this is a documentation change, please briefly describe what you've changed and why.
This pull request updates our filtering logic to not use our SanitizeToPlaintext function for reducing status HTML content to plaintext, but instead use https://github.com/k3a/html2text, which doesn't cause weird line concatenation, and can competently extract links, mentions, and hashtags properly from the text.
To avoid re-parsing a status from HTML every time we want to filter it, a TTLCache has been added to the converter which stores the parsed-to-text version of statuses.
Also some minor fixes to our filter regexes, to include whitespace and start/end line in our whole word match.
Description
This pull request updates our filtering logic to not use our
SanitizeToPlaintext
function for reducing status HTML content to plaintext, but instead use https://github.com/k3a/html2text, which doesn't cause weird line concatenation, and can competently extract links, mentions, and hashtags properly from the text.To avoid re-parsing a status from HTML every time we want to filter it, a TTLCache has been added to the converter which stores the parsed-to-text version of statuses.
Also some minor fixes to our filter regexes, to include whitespace and start/end line in our whole word match.
closes https://github.com/superseriousbusiness/gotosocial/issues/3298 closes https://github.com/superseriousbusiness/gotosocial/issues/3128
Checklist
Please put an x inside each checkbox to indicate that you've read and followed it:
[ ]
->[x]
If this is a documentation change, only the first checkbox must be filled (you can delete the others if you want).
go fmt ./...
andgolangci-lint run
.