mastodon / mastodon

Your self-hosted, globally interconnected microblogging community
https://joinmastodon.org
GNU Affero General Public License v3.0
47.02k stars 6.96k forks source link

Support for Hashtag Custom Filters #27990

Open ThisIsMissEm opened 11 months ago

ThisIsMissEm commented 11 months ago

Steps to reproduce the problem

  1. Create a custom filter for #example and apply the Whole Word option
  2. Create a status with the text "Test filter https://example.org/#example test"
  3. Fetch the custom filters with CustomFilter.cached_filters_for("<account id>")
  4. Apply the filters to the created status: CustomFilter.apply_cached_filters(filters, status)

Expected behaviour

The filter should not match because "https://example.org/#example" is not including "#example" at a word boundary

Actual behaviour

Filter matches the hash fragment in the URL

Detailed description

It seems this bug is introduced by the logic of the whole word filter application:

sb = /\A[[:word:]]/.match?(keyword.keyword) ? '\b' : ''
eb = /[[:word:]]\z/.match?(keyword.keyword) ? '\b' : ''

/(?mix:#{sb}#{Regexp.escape(keyword.keyword)}#{eb})/

Where /\A[[:word:]]/ does not match on symbols (e.g., it'll return false for "#example")

This results in the word boundaries not being respected.

Mastodon instance

No response

Mastodon version

main

Technical details

If this is happening on your own Mastodon server, please fill out those:

ThisIsMissEm commented 11 months ago

I was going to write a test case, but it appears that we may have zero code coverage of this area of the code? cc @renchap / @ClearlyClaire

ThisIsMissEm commented 11 months ago

I think the solution is to actually match as follows:

sb = /\A|^[[:word:]]/.match?(keyword.keyword) ? '\b' : ''
eb = /[[:word:]]\z|$/.match?(keyword.keyword) ? '\b' : ''

/(?mix:#{sb}#{Regexp.escape(keyword.keyword)}#{eb})/

That is, match on start of word or on start of line and end of word or end of line

ThisIsMissEm commented 11 months ago

Have opened https://github.com/mastodon/mastodon/pull/27991 to fix this

ThisIsMissEm commented 11 months ago

An alternative solution would be the introduction of custom filters specifically for hashtags, as noted in: https://github.com/mastodon/mastodon/discussions/21762