swear word checker - Githubissues

gustavwilliam commented 3 years ago

This seems to be two features without any description. Could you split it into separate issues and describe clearly what the features would be about?

plusk-dev commented 3 years ago

This seems to be two features without any description. Could you split it into separate issues and describe clearly what the features would be about?

The contributors command (coming straight from the GitHub API):

For the toxicity checker, I thought of two ways:

Listing out all the toxic words in a list and checking their presence in a message.
Using NLTK for contextual detection of toxicity (I don't have any idea as of now for this). I had read an overview for this here: https://arxiv.org/ftp/arxiv/papers/1903/1903.06765.pdf

Contextual Detection of toxicity would help more since we are actually detecting hate speech rather than just checking the presence of swear words which maybe used in a lighter sense. A good example of this is: https://github.com/conversationai/perspectiveapi/

Vyvy-vi commented 3 years ago

with regard to NLTK:

How much performance demanding would the program be?
Is the execution time fast?

Moreover, could we consider libraries that do the same, such as python equivalents of alex.js and retext-equality?

Also, in my opinion: keeping issues and PRs centric on similar agendas helps maintainers manage them much more easily

Shivansh-007 commented 3 years ago

Yeah just what Vyvy said and why not just use google bad words list, like here

Inheritanc-e commented 3 years ago

I think swearing would be fine if they don't do it too excessively, we can blacklist some words that would alert a role saying that a person has said a blacklisted word. This would probably be enough.

Vyvy-vi commented 3 years ago

personally, I find nltk models to be rather dull This is an example of a similar blocking we use in another place(this uses a highly trained library... but still a long way to go) Screenshot 2020-12-31 at 5 39 05 PM

plusk-dev commented 3 years ago

with regard to NLTK:

How much performance demanding would the program be?

Is the execution time fast?

Moreover, could we consider libraries that do the same, such as python equivalents of alex.js and retext-equality?

Also, in my opinion: keeping issues and PRs centric on similar agendas helps maintainers manage them much more easily

In that case, we should use the first of the two methods that i gave cuz that would be comparitively more performance efficient. And yeah, I edited the issue title cuz i have made another one

gustavwilliam commented 3 years ago

We would presumably have a filter on the bot, that alerts mods and/or deletes the messages that include things we absolutely don't want to have on the server. This could be nazist and rasist symbols and words, that mods should be on the lookout for.

However, I don't want some feature that penalises the users for swearing. The toxisity checker also seems like a nightmare to implement, if it can even be done. Let's have an issue about implementing a filter instead.

plusk-dev commented 3 years ago

We would presumably have a filter on the bot, that alerts mods and/or deletes the messages that include things we absolutely don't want to have on the server. This could be nazist and rasist symbols and words, that mods should be on the lookout for.

However, I don't want some feature that penalises the users for swearing. The toxisity checker also seems like a nightmare to implement, if it can even be done. Let's have an issue about implementing a filter instead.

Yeah, you are absolutely right. I agree that we should have a filter checker rather than an absolute "toxicity" checker.

gurkult / gurkbot

swear word checker #49