vandie / isProfanity

A profanity checker which, unlike alternatives, uses the Wagner–Fischer algorithm in order to catch variations that you haven't thought of.
MIT License
12 stars 4 forks source link

Browser support #3

Closed rileyjs10 closed 7 years ago

rileyjs10 commented 7 years ago

What would it take to get this supported in a browser environment?

vandie commented 7 years ago

Hard to say. I wouldn't recommend it as, if running in the browser, it can very easily be bypassed. The biggest issue right now is the use of fs (filesystem) which is not supported in browser.

One option is to make a rest api running on a nodeJS server that makes use of isProfanity although that isn't really running it in browser and removes any reason for making it run in browser. You could rip the contents of the csvs and put them as variables in the code itself thus removing the need for fs but that would remove the ability to easily define custom lists and due to the csv sizes would probably load very slowly...

Again though, I would not recommend making it run in browser at all but if you still want to then I'll try and help.

May I ask why you want it to run in browser?

rileyjs10 commented 7 years ago

Our app runs in a closed system and generates a video stream, so I'm not really worried about user bypassing. One of our data vendors provides user generated input. They say that they curate it, but we've seen occasional obscene words get through, which is something we want to avoid. I like your approach since it can catch adjacent words, such as fvck, without having to explicitly black list every combination.

What I've done so far is take an english dictionary csv, and run it against the profanity.csv. Any word that has a sureness > 0.6 (I increased that to be a little less aggressive) now goes into my new exceptions list. Any other word wouldn't match any way, so no need to white list them. That's knocked the exception list down to 7000 words, which I think is a little more manageable. We'll probably end up serving that as JSON if we find we have to modify it often. Thanks for the quick reply!

rileyjs10 commented 7 years ago

Oh, one thing you might want to do, is use strings.split(/\W+/) rather than split(' '), that way, you're more likely comparing whole words, rather than words an any punctuation that might be in the way.

On Thu, Apr 13, 2017 at 12:30 PM, Michael Van Der Velden < notifications@github.com> wrote:

Hard to say. I wouldn't recommend it as, if running in the browser, it can very easily be bypassed The biggest issue right now is the use of fs (filesystem) which is not supported in browser.

One option is to make a rest api running on a nodeJS server that makes use of isProfanity although that isn't really running it in browser and removes any reason for making it run in browser. You could rip the contents of the csvs and put them as variables in the code itself thus removing the need for fs but that would remove the ability to easily define custom lists and due to the csv sizes would probably load very slowly...

Again though, I would not recommend making it run in browser at all but if you still want to then I'll try and help.

May I ask why you want it to run in browser?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vandie/isProfanity/issues/3#issuecomment-293951207, or mute the thread https://github.com/notifications/unsubscribe-auth/AVyQG1AyR6Ak38Bt8BTLhiigjDj1l0uxks5rvk2-gaJpZM4M85uC .

-- Jeff Riley | Principal Software Engineer w: 770-226-2803 e: jeff.riley@weathergroup.com

vandie commented 7 years ago

PLEASE DON'T DO THIS.

By removing all punctuation before checking it will make f**k have a lower rating as you will actually be checking fk, that would be fine had you left the minimum sureness but, as a sureness of 0.5 is changing half the word, your change ensures that in a 4 letter word it will only stop one letter difference rather than the original 3 letters (The original blocks f***). As fk is a two letter word and 3 letters would need to be changed to turn it into fuck ( k into u, add ck) it would not be blocked with your change. This is obviously not good as it means that you have, basically, no filter.

Hope this helps. Mike.