Open montycheese opened 5 years ago
https://github.com/web-mech/badwords/commit/31126d6004f2d997f5121a3fbbd6992af1a0691b - this has been attempted and we failed miserably at doing this. A pull-request is welcome but at this point I've invested too much time chasing this issue. I welcome any suggestions or input for a solution to this
I've solved this by looping through the string characters and concatenating a new string. E.g. typing badword will fail, as well as typing abcdefgbadword because my loop starts removing each character and testing the string up until I get a new string of badword.
//break out chars into array
const stringArray = Array.from(string);
var badWord = false;
//loop through string array
for (var i = 0; i < stringArray.length;i++) {
var newArray = [];
//loop again starting at i so we remove each character from the string
for (var j = i; j < stringArray.length;j++) {
newArray.push(stringArray[j]);
}
//join array into string;
const newString = newArray.join("");
//test
if (this.list
.filter((word) => {
const wordExp = new RegExp(`\\b${word.replace(/(\W)/g, '\\$1')}\\b`, 'gi');
return !this.exclude.includes(word.toLowerCase()) && wordExp.test(newString);
})
.length > 0) {
badWord = true;
}
}
return badWord;
I was running into this too and if anyone is interested I have made an adaption of @jimsideout code which can look for profanity anywhere in a given string (since his will only find it at the end), also for performance purposes you can set a min/max length of string to test;
this.testProfanity = (str) => {
let minLen = 2, maxLen = 8;
minLen--;
if(str.length <= minLen) return false;
const profanityFilter = new ProfanityFilter();
if(profanityFilter.isProfane(str)) return true;
const words = str.split(' ');
for(const word of words){
if(word.length <= minLen) continue;
const chars = [...word];
for(let i = 0; i < chars.length - minLen; i++){
const testChars = [];
for(let k = i; k < i + minLen; k++){
testChars.push(chars[k]);
}
for(let j = i + minLen; j < Math.min(chars.length, i + maxLen); j++){
testChars.push(chars[j]);
const test = testChars.join('');
const isProfane = profanityFilter.isProfane(test);
console.log(`testProfanity: ${test} isProfane: ${isProfane}`);
if(isProfane) return true;
}
}
}
return false;
}
results from testing "hi there abcfuckerabc how";
testProfanity: hi isProfane: false
testProfanity: th isProfane: false
testProfanity: the isProfane: false
testProfanity: ther isProfane: false
testProfanity: there isProfane: false
testProfanity: he isProfane: false
testProfanity: her isProfane: false
testProfanity: here isProfane: false
testProfanity: er isProfane: false
testProfanity: ere isProfane: false
testProfanity: re isProfane: false
testProfanity: ab isProfane: false
testProfanity: abc isProfane: false
testProfanity: abcf isProfane: false
testProfanity: abcfu isProfane: false
testProfanity: abcfuc isProfane: false
testProfanity: abcfuck isProfane: false
testProfanity: abcfucke isProfane: false
testProfanity: bc isProfane: false
testProfanity: bcf isProfane: false
testProfanity: bcfu isProfane: false
testProfanity: bcfuc isProfane: false
testProfanity: bcfuck isProfane: false
testProfanity: bcfucke isProfane: false
testProfanity: bcfucker isProfane: false
testProfanity: cf isProfane: false
testProfanity: cfu isProfane: false
testProfanity: cfuc isProfane: false
testProfanity: cfuck isProfane: false
testProfanity: cfucke isProfane: false
testProfanity: cfucker isProfane: false
testProfanity: cfuckera isProfane: false
testProfanity: fu isProfane: false
testProfanity: fuc isProfane: false
testProfanity: fuck isProfane: true
what's the status on that, is it merged? Looking at srouce I didn't see any PR.
Description
Profanity that is concenated e.g. word1word1 or word1word2 passes the filter, even if word1 and word2 are both added to the filte list. If we add the concenated word to the filter, it will block it. However, there are numerous iterations of concentated profane words and it will be impossible to get every one of them in the filter. Ideally, the filter would understand when a profane word is concenated.