web-mech / badwords

A javascript filter for badwords
MIT License
618 stars 325 forks source link

Combined bad words not being detected as profane. #60

Open KarnEdge opened 5 years ago

KarnEdge commented 5 years ago
filter.isProfane('fuckshit') // returns false 
filter.isProfane('fuck shit') // returns true

In lang.json, it has *fuck* and I was hoping that would mean anything around the word fuck would be caught like 'gofuckyourself', etc.

Of course, we only want to do this to certain words like fuck or shit that do not make up any other word in the dictionary: http://www.morewords.com/contains/fuck/ And, we avoid the Clbuttic mistake as well.

Any way to get it to work this way?

TautvydasDerzinskas commented 4 years ago

+1 on this.

Also examples such as: "dick1", "ofuck" are not detected.

Bound3R commented 4 years ago

To add to that, I tried using numbers and other types of combinations, but it didn't detect them either:

filter.isProfane('afuck') // returns false 
filter.isProfane('fuck1') // returns false
jimsideout commented 4 years ago

I've added an answer to https://github.com/web-mech/badwords/issues/51

x4iiiis commented 4 years ago

I've added an answer to #51

This works for back-to-back blacklisted words, but not if you concatenate characters / clean words along with the bad one(s).

Any ideas?

jimsideout commented 4 years ago

I've added an answer to #51

This works for back-to-back blacklisted words, but not if you concatenate characters / clean words along with the bad one(s).

Any ideas?

Can you give me an example?

x4iiiis commented 4 years ago

I've added an answer to #51

This works for back-to-back blacklisted words, but not if you concatenate characters / clean words along with the bad one(s). Any ideas?

Can you give me an example?

Cheers for the quick response!

I may have mis-implemented your solution, but an example would be 'f...face.' I realise that could just be added to the list but replacing 'face' with 'o' for example still bypasses it for me.

Interestingly though, 'f...head' gets blocked, despite not being on the list.

However, if you chain several F-bombs, or a mix of listed words you'll get the desired result.

Thanks again

jimsideout commented 4 years ago

Cheers for the quick response!

I may have mis-implemented your solution, but an example would be 'f...face.' I realise that could just be added to the list but replacing 'face' with 'o' for example still bypasses it for me.

Interestingly though, 'f...head' gets blocked, despite not being on the list.

However, if you chain several F-bombs, or a mix of listed words you'll get the desired result.

Thanks again

My solution was to check on every key press, so I would never make it to the 'face' portion of 'f...face'. In my application I just throw a warning, then wipe out the 'f...' so they can never make it past it.

It sounds like you're potentially doing the check when editing ends or focus changes, is that right? To solve that I think we'd need to loop forward and backwards while dropping a character each time and check all character combinations in between. E.g. 'heyf...head' would need to eventually drop the first 3 characters as well as the last 4 to get the triggered word.

x4iiiis commented 4 years ago

Cheers for the quick response! I may have mis-implemented your solution, but an example would be 'f...face.' I realise that could just be added to the list but replacing 'face' with 'o' for example still bypasses it for me. Interestingly though, 'f...head' gets blocked, despite not being on the list. However, if you chain several F-bombs, or a mix of listed words you'll get the desired result. Thanks again

My solution was to check on every key press, so I would never make it to the 'face' portion of 'f...face'. In my application I just throw a warning, then wipe out the 'f...' so they can never make it past it.

It sounds like you're potentially doing the check when editing ends or focus changes, is that right? To solve that I think we'd need to loop forward and backwards while dropping a character each time and check all character combinations in between. E.g. 'heyf...head' would need to eventually drop the first 3 characters as well as the last 4 to get the triggered word.

Yeah, that makes sense. Thanks for that.

It is checking every keystroke, but it doesn't disallow the user from continuing to type beyond detected profanity. It just displays error text under the input box when the filter has caught something.

bernardbaker commented 4 years ago

Cheers for the quick response! I may have mis-implemented your solution, but an example would be 'f...face.' I realise that could just be added to the list but replacing 'face' with 'o' for example still bypasses it for me. Interestingly though, 'f...head' gets blocked, despite not being on the list. However, if you chain several F-bombs, or a mix of listed words you'll get the desired result. Thanks again

My solution was to check on every key press, so I would never make it to the 'face' portion of 'f...face'. In my application I just throw a warning, then wipe out the 'f...' so they can never make it past it.

It sounds like you're potentially doing the check when editing ends or focus changes, is that right? To solve that I think we'd need to loop forward and backwards while dropping a character each time and check all character combinations in between. E.g. 'heyf...head' would need to eventually drop the first 3 characters as well as the last 4 to get the triggered word.

Is this use case resolved? Where a string may be "testfuck" which is "false" when calling console.log( filter.isProfane("testfuck") === false )

Is it possible to use a String('testfuck').indexOf('fuck') for each bad word in the list of bad words in the filter.clean(argument) implementation ?

keybraker commented 3 years ago

This fixes your problem, you make this \\b${word.replace(/(\W)/g, '\\$1')}\\b to this \\b(\\w*${word.replace(/(\W)/g, '\\$1')}\\w*)\\b.

notice, that some words in the json make the regex fail like shit for example

isProfane(string) {
    return this.list
      .filter((word) => {
        const wordExp = new RegExp(`\\b(\\w*${word.replace(/(\W)/g, '\\$1')}\\w*)\\b`, 'gi');
        return !this.exclude.includes(word.toLowerCase()) && wordExp.test(string);
      })
      .length > 0 || false;
  }
Jameskmonger commented 3 years ago

@bernardbaker important to consider the Scunthorpe problem when considering includes or indexOf

lockieluke commented 2 years ago

Currently using patch-package to solve the problem

diff --git a/node_modules/bad-words/lib/badwords.js b/node_modules/bad-words/lib/badwords.js
index 3990c41..15de96e 100644
--- a/node_modules/bad-words/lib/badwords.js
+++ b/node_modules/bad-words/lib/badwords.js
@@ -31,11 +31,11 @@ class Filter {
    */
   isProfane(string) {
     return this.list
-      .filter((word) => {
-        const wordExp = new RegExp(`\\b${word.replace(/(\W)/g, '\\$1')}\\b`, 'gi');
-        return !this.exclude.includes(word.toLowerCase()) && wordExp.test(string);
-      })
-      .length > 0 || false;
+        .filter((word) => {
+          const wordExp = new RegExp(`\\b(\\w*${word.replace(/(\W)/g, '\\$1')}\\w*)\\b`, 'gi');
+          return !this.exclude.includes(word.toLowerCase()) && wordExp.test(string);
+        })
+        .length > 0 || false;
   }

   /**