raymondjavaxx / swearjar-node

Profanity detection and filtering library.
MIT License
73 stars 33 forks source link

Bypassed by plural profanities #12

Open 3dfoster opened 7 years ago

3dfoster commented 7 years ago

Simply adding an 's' to the end of a profanity defeats the filter. For example, "No fucks given" is not censored.

no-stack-dub-sack commented 7 years ago

@raymondjavaxx This is a totally awesome package and exactly what I was looking for! However, I did notice this issue and that you can also get "compound" curses, so to speak, to pass. For example, "sh-tfu-k" is not on your default list, and while adding each additional word is of course possible with the way you have it set up, I wonder if you would be open to me submitting a PR that solves both of these issues without the need for adding additional specific words? I have a working solution locally and I would love to show it to you / discuss it.

A couple of examples:

swearjar.profane('no f-cks given') // true
swearjar.censor('no f-cks given') // no ***** given
swearjar.censor('sh-ttyf-cks') // ***********

Let me know! Thanks.

loctn commented 7 years ago

@no-stack-dub-sack the compound stuff is tricky without significantly altering (and slowing down) the scanning algorithm.

@fasterthan plurals should be an easy fix, I'll see if I can get to it.

loctn commented 7 years ago

@fasterthan actually this could be covered by #11 so I'll hold off for now.

no-stack-dub-sack commented 7 years ago

@loctn Yeah, upon a closer look, and deeper testing with the solution I had, I did run into some tricky situations. Namely censoring common words that contained swears. For instance, class was censored because it contained an obvious curse. However, for my use case, this was fine, I just took "ass" off the list because it's less offensive than many others, and keeping the compound coverage seemed more important at the time.

This was the edit I made to the scan method:

  scan: function (text, callback) {
    var word, key, match;
    var regex = /\w+/g

    while (match = regex.exec(text)) {
      word = match[0];
      key  = word.toLowerCase();

      if (key in this._badWords && Array.isArray(this._badWords[key])) {
        if (callback(word, match.index, this._badWords[key]) === false) {
          break;
        }
      } else {
        /******* added else statement to catch compound bad words and plurals such as
        "no fu*ks given" and "fu*ksh*t" without having to add each to default list */
        for (let badWord in this._badWords) {
          if (key.search(badWord) > -1) {
            if (callback(key, match.index, this._badWords[badWord]) === false) {
              break;
            }
          }
        }
      }
    }
  },

Again, may not be ideal, but working well for my use-case.