jo3-l / obscenity

Robust, extensible profanity filter for NodeJS
MIT License
82 stars 5 forks source link

bug: Kung Fu false positive #67

Closed krasnoperov closed 4 months ago

krasnoperov commented 4 months ago

Expected behavior

matcher.hasMatch('Kung-Fu') returns true

Actual behavior

matcher.hasMatch('Kung-Fu') returns false

Minimal reproducible example

import assert from 'node:assert'

import {
  englishDataset,
  englishRecommendedTransformers,
  RegExpMatcher,
} from 'obscenity'

const matcher = new RegExpMatcher({
  ...englishDataset.build(),
  ...englishRecommendedTransformers,
})

assert.equal(matcher.hasMatch('Kung-Fu'), false)
assert.equal(matcher.hasMatch('Kung Fu'), false)
assert.equal(matcher.hasMatch('Kung Fu Panda'), false)

// This one actually works
assert.equal(matcher.hasMatch('KungFu'), false)

Steps to reproduce

  1. Run the code above
  2. It falls with assert exception

Additional context

No response

Node.js version

v20.15.0

Obscenity version

0.2.1

Priority

Terms

jo3-l commented 4 months ago

The default dataset contains the pattern |fu|, which (correctly, but undesirably) matches on the -Fu in Kung-Fu. There are two potential ways we could fix this issue:

  1. Remove the |fu| pattern entirely, or
  2. Whitelist Kung-Fu and leave the |fu| pattern untouched.

I am leaning toward 2) at the moment: the |fu| pattern seems useful in general, and I cannot think of any other egregious false positives other than the instance you report. What do you think?

krasnoperov commented 4 months ago

I think that whitelisting Kung-Fu is a good option here. Also, it is possible to handle any future false positives by adding them to the whitelist as they arise.

jo3-l commented 4 months ago

I released v0.3.1 with the fix (please ignore v0.2.2 and v0.3.0, both of which were problematic due to my botching some release automation—sorry for the noise!). Thanks again for the report.