jo3-l / obscenity

Robust, extensible profanity filter for NodeJS
MIT License
70 stars 2 forks source link

bug: Kung Fu false positive #67

Closed krasnoperov closed 1 month ago

krasnoperov commented 1 month ago

Expected behavior

matcher.hasMatch('Kung-Fu') returns true

Actual behavior

matcher.hasMatch('Kung-Fu') returns false

Minimal reproducible example

import assert from 'node:assert'

import {
  englishDataset,
  englishRecommendedTransformers,
  RegExpMatcher,
} from 'obscenity'

const matcher = new RegExpMatcher({
  ...englishDataset.build(),
  ...englishRecommendedTransformers,
})

assert.equal(matcher.hasMatch('Kung-Fu'), false)
assert.equal(matcher.hasMatch('Kung Fu'), false)
assert.equal(matcher.hasMatch('Kung Fu Panda'), false)

// This one actually works
assert.equal(matcher.hasMatch('KungFu'), false)

Steps to reproduce

  1. Run the code above
  2. It falls with assert exception

Additional context

No response

Node.js version

v20.15.0

Obscenity version

0.2.1

Priority

Terms

jo3-l commented 1 month ago

The default dataset contains the pattern |fu|, which (correctly, but undesirably) matches on the -Fu in Kung-Fu. There are two potential ways we could fix this issue:

  1. Remove the |fu| pattern entirely, or
  2. Whitelist Kung-Fu and leave the |fu| pattern untouched.

I am leaning toward 2) at the moment: the |fu| pattern seems useful in general, and I cannot think of any other egregious false positives other than the instance you report. What do you think?

krasnoperov commented 1 month ago

I think that whitelisting Kung-Fu is a good option here. Also, it is possible to handle any future false positives by adding them to the whitelist as they arise.

jo3-l commented 1 month ago

I released v0.3.1 with the fix (please ignore v0.2.2 and v0.3.0, both of which were problematic due to my botching some release automation—sorry for the noise!). Thanks again for the report.