wetfish / fishy-bot

One fishy IRC bot
GNU General Public License v2.0
4 stars 13 forks source link

Triplicates can be avoided by using special characters #24

Open itsrachelfish opened 9 years ago

itsrachelfish commented 9 years ago

By including special characters (color codes, bold, etc.) or only making small changes like adding a space or exclamation mark, it is possible to bypass fishy's triplicate detection.

Fishy should strip special characters from messages and do a text comparison of the most recent lines to make sure they don't have repeating sections. For example, the following messages should trigger triplicate detection even though they aren't exact matches:

hex: IS IT TRUE THAT YOU LOVE BUTTS?
weazzy: IS IT TRUE THAT YOU LOVE BUTTS?
rachel: IS IT TRUE THAT YOU LOVE BUTTS?
tmick0 commented 9 years ago

this sounds like a job for FUZZY HASHING

ghost commented 9 years ago

Nah, easier solved with "stripColorsAndStyle" from https://github.com/fent/irc-colors.js

itsrachelfish commented 9 years ago

@edwin-pers Your "stripColorsAndStyle" solution would not solve the example given.

@le1ca Thank you for the tip, I found a fuzzy hashing lib for node.js: https://github.com/huwenshuo/ctph.js

tmick0 commented 9 years ago

:+1: