web-mech / badwords

A javascript filter for badwords
MIT License
618 stars 325 forks source link

Concenated profanity passes filter #51

Open montycheese opened 5 years ago

montycheese commented 5 years ago

Description

Profanity that is concenated e.g. word1word1 or word1word2 passes the filter, even if word1 and word2 are both added to the filte list. If we add the concenated word to the filter, it will block it. However, there are numerous iterations of concentated profane words and it will be impossible to get every one of them in the filter. Ideally, the filter would understand when a profane word is concenated.

web-mech commented 5 years ago

https://github.com/web-mech/badwords/commit/31126d6004f2d997f5121a3fbbd6992af1a0691b - this has been attempted and we failed miserably at doing this. A pull-request is welcome but at this point I've invested too much time chasing this issue. I welcome any suggestions or input for a solution to this

jimsideout commented 4 years ago

I've solved this by looping through the string characters and concatenating a new string. E.g. typing badword will fail, as well as typing abcdefgbadword because my loop starts removing each character and testing the string up until I get a new string of badword.

//break out chars into array
const stringArray = Array.from(string);
var badWord = false;

//loop through string array
for (var i = 0; i < stringArray.length;i++) {
  var newArray = [];

  //loop again starting at i so we remove each character from the string
  for (var j = i; j < stringArray.length;j++) {
    newArray.push(stringArray[j]);
  }

  //join array into string;
  const newString = newArray.join("");

  //test
  if (this.list
    .filter((word) => {
      const wordExp = new RegExp(`\\b${word.replace(/(\W)/g, '\\$1')}\\b`, 'gi');
      return !this.exclude.includes(word.toLowerCase()) && wordExp.test(newString);
    })
    .length > 0) {
      badWord = true;
    }

}

return badWord;
fustaro commented 4 years ago

I was running into this too and if anyone is interested I have made an adaption of @jimsideout code which can look for profanity anywhere in a given string (since his will only find it at the end), also for performance purposes you can set a min/max length of string to test;

this.testProfanity = (str) => {
    let minLen = 2, maxLen = 8;

    minLen--;

    if(str.length <= minLen) return false;

    const profanityFilter = new ProfanityFilter();

    if(profanityFilter.isProfane(str)) return true;

    const words = str.split(' ');

    for(const word of words){
        if(word.length <= minLen) continue;

        const chars = [...word];

        for(let i = 0; i < chars.length - minLen; i++){
            const testChars = [];

            for(let k = i; k < i + minLen; k++){
                testChars.push(chars[k]);
            }

            for(let j = i + minLen; j < Math.min(chars.length, i + maxLen); j++){
                testChars.push(chars[j]);
                const test = testChars.join('');
                const isProfane = profanityFilter.isProfane(test);
                console.log(`testProfanity: ${test} isProfane: ${isProfane}`);
                if(isProfane) return true;
            }
        }
    }

    return false;
}

results from testing "hi there abcfuckerabc how";

testProfanity: hi isProfane: false
testProfanity: th isProfane: false
testProfanity: the isProfane: false
testProfanity: ther isProfane: false
testProfanity: there isProfane: false
testProfanity: he isProfane: false
testProfanity: her isProfane: false
testProfanity: here isProfane: false
testProfanity: er isProfane: false
testProfanity: ere isProfane: false
testProfanity: re isProfane: false
testProfanity: ab isProfane: false
testProfanity: abc isProfane: false
testProfanity: abcf isProfane: false
testProfanity: abcfu isProfane: false
testProfanity: abcfuc isProfane: false
testProfanity: abcfuck isProfane: false
testProfanity: abcfucke isProfane: false
testProfanity: bc isProfane: false
testProfanity: bcf isProfane: false
testProfanity: bcfu isProfane: false
testProfanity: bcfuc isProfane: false
testProfanity: bcfuck isProfane: false
testProfanity: bcfucke isProfane: false
testProfanity: bcfucker isProfane: false
testProfanity: cf isProfane: false
testProfanity: cfu isProfane: false
testProfanity: cfuc isProfane: false
testProfanity: cfuck isProfane: false
testProfanity: cfucke isProfane: false
testProfanity: cfucker isProfane: false
testProfanity: cfuckera isProfane: false
testProfanity: fu isProfane: false
testProfanity: fuc isProfane: false
testProfanity: fuck isProfane: true
evdama commented 4 years ago

what's the status on that, is it merged? Looking at srouce I didn't see any PR.