Closed VyrCossont closed 1 week ago
Mmm, maybe instead of BlueMonday we can use some other method of converting HTML to plaintext for the purposes of matching against filters. In the frontend we now use https://www.npmjs.com/package/html-to-text for converting the HTML representation of statuses into text for showing in certain places. If there's a Go equivalent we could look at that, perhaps.
Something like this -- https://pkg.go.dev/github.com/k3a/html2text -- or this -- https://github.com/jaytaylor/html2text -- perhaps? But then if these operations are expensive (and I'd imagine they're not as cheap as BlueMonday) we probably also want to be storing those results in the 'text' field of the *gtsmodel.Status
model, ie., in the database. Not 100% sure.
For example,
<p>as</p><p>df</p>
sanitizes toasdf
. This would cause a false positive for a filter with the keywordasdf
, and a false negative for a whole-word filter with the keyworddf
. I'd expect output more likeas\ndf\n
.Likewise,
<br>
tags are dropped, not converted to\n
. Guessing the same holds for<wbr>
, and any other tags closely equivalent to characters. (Fun exercise: what would we expect from<hr>
?)Not familiar with the BlueMonday sanitizer we use, so not sure how hard this would be to fix.
Discovered while investigating #3128.