string distance / fuzzy matching instead of hard substring keyword searching

bkil commented 2 years ago

It could be worthwhile to also implement some simple edit-distance based fuzzy typo allowance & fuzzy keyword matching might be set as well. And also, if a message contains too (many) characters not participating in valid words of the sentence, that would be a red flag.

Each room is limited to a single language in 99% of the cases, thus posting foreign spam is already a red flag. This is important in the dozens of local language rooms where the indiscriminate English spammer sometimes joins as well. But also, dictionaries exist (see your package manager, or Wiktionary, Wikipedia, etc). Or you could just go through the chat log to collect words and sentences used by non-troll members in the past (=ham) to help discriminate it from unusual content (spam).

jjj333-p commented 2 years ago

this is an interesting issue, but it is far beyond my skillset. I would however love to see something like this come through, and i would love for if someone else knows how to do this they could contribute

jjj333-p commented 5 months ago

update, this might be doable in some manner, perhaps using string distance. still on the backburner but this might be the solution i to something

jjj333-p / spam-police

string distance / fuzzy matching instead of hard substring keyword searching #9