Open rion18 opened 3 months ago
Using obscenity to censor a string containing repeating characters such as pppiiittt and a dataset that contains the word pit.
obscenity
pppiiittt
pit
Using:
collapseDuplicatesTransformer({ defaultThreshold: 1, }),
I would expect the whole pppiiittt word to be matched.
Instead, only the first t is detected, matching pppiiit. The final two t are "not a part of the profanity", while they should be.
t
pppiiit
const { englishDataset, parseRawPattern, DataSet, RegExpMatcher, collapseDuplicatesTransformer, } = require('obscenity'); const data = new DataSet() .addAll(englishDataset) .addPhrase(phrase => phrase .setMetadata({ originalWord: 'pit' }) .addPattern(parseRawPattern('pit')) ).build(); const transformers = { blacklistMatcherTransformers: [ collapseDuplicatesTransformer({ defaultThreshold: 1, }), ], whitelistMatcherTransformers: [], }; const matcher = new RegExpMatcher({ ...profanityDataset, ...transformers, }); const stringPit = 'ppiitt'; if (matcher.hasMatch(stringPit)) { const matches = matcher.getAllMatches(stringPit, true); return textCensor.applyTo(stringPit, matches); } return stringPit;
No response
18.17.1
0.4.0
Expected behavior
Using
obscenity
to censor a string containing repeating characters such aspppiiittt
and a dataset that contains the wordpit
.Using:
I would expect the whole
pppiiittt
word to be matched.Actual behavior
Instead, only the first
t
is detected, matchingpppiiit
. The final twot
are "not a part of the profanity", while they should be.Minimal reproducible example
Steps to reproduce
No response
Additional context
No response
Node.js version
18.17.1
Obscenity version
0.4.0
Priority
Terms