jo3-l / obscenity

Robust, extensible profanity filter for NodeJS
MIT License
79 stars 3 forks source link

Request: support accented letters in patterns #80

Open verekia opened 3 days ago

verekia commented 3 days ago

Description

.addPhrase(phrase => phrase.setMetadata({ originalWord: 'coño' }).addPattern(pattern`coño`))
.addPhrase(phrase => phrase.setMetadata({ originalWord: 'pédé' }).addPattern(pattern`pédé`))

causes:

ParserError: 1:4: Cannot escape character 'u'; the only characters that can be escaped are the following: '\', '[', ']', '?', '|'.

Solution

Add support for accented characters.

According to the suggestions of my Mac keyboard long-presses:

w: ŵ e: èéêëěẽēėę r: ř t: țťþ y: ýŷÿ u: ùúûüǔũūűůu i: ìíîïǐĩīıį o: òóôöǒœøõō a: àáâäǎæãåā s: ßşșśš d: ďð g: ğġ h: ħ k: ķ l: łļľ z: źžż c: çćčċ n: ñńņň

Code of Conduct

jo3-l commented 1 day ago

I cannot reproduce the described error.

const dataset = new DataSet()
    .addPhrase((phrase) => phrase.setMetadata({ originalWord: 'coño' }).addPattern(pattern`coño`))
    .addPhrase((phrase) => phrase.setMetadata({ originalWord: 'pédé' }).addPattern(pattern`pédé`));
console.log(new RegExpMatcher(dataset.build()).hasMatch('coño'));

prints true for me, as expected. Can you provide a MRE?