Emoji is not supported when a profanity is found next to that character

rion18 commented 4 months ago

Expected behavior

Using obscenity to censor a string containing an emoji, like this one: 🤣bummer, and a dataset that contains the word bummer.

Using this strategy,

const CENSOR_STRATEGY = (censorContext) => ''.repeat(censorContext.matchLength);

for removing the profanities,

The expected output would be 🤣.

Actual behavior

Instead, the output is this: 🤣b. It matches the word bummer correctly, BUT when the matcher tries to find the matches, there's an error in the index.

Minimal reproducible example

const {
  englishDataset,
  parseRawPattern,
  DataSet,
  RegExpMatcher,
} = require('obscenity');

const data = new DataSet()
    .addAll(englishDataset)
    .addPhrase(phrase => 
      phrase
        .setMetadata({ originalWord: 'bummer' })
        .addPattern(parseRawPattern('bummer'))
    ).build();

const matcher = new RegExpMatcher({
    ...profanityDataset, // no transformers
  });

const stringBummer = '🤣bummer';
if (matcher.hasMatch(stringBummer)) {
  const matches = matcher.getAllMatches(stringBummer, true);
  return textCensor.applyTo(stringBummer, matches);
}
return stringBummer;

Steps to reproduce

No response

Additional context

No response

Node.js version

18.17.1

Obscenity version

0.3.1

Priority

[ ] Low
[ ] Medium
[ ] High

Terms

[X] I agree to follow the project's Code of Conduct.
[X] I have searched existing issues for similar reports.

jo3-l commented 4 months ago

Thanks for the short repro. I think I know what the issue is and will take a stab at fixing it today.

jo3-l commented 4 months ago

Fix released in v0.4.0.

jo3-l / obscenity