Closed cedws closed 3 years ago
Thanks for the feedback.
If you look at the history of the heuristics feature, you see that it is pretty old. In my experience, people usually do not use it. This module is typically just used for the honey spam feature.
I doubt that anyone wants to improve the heuristics feature, because it is not really doable, as you also mentioned. To be honest, even the honey spam approach does not work that well any more nowadays.
Anyway, the nice thing about open source is that you are free to solve the issues you have with the module yourself ;-)
People are using it in production and will lose sales because of this code without ever being aware. We'll never know the frequency of course.
If nobody wants to improve it, fair enough, I'm not too keen on it either. Will you accept a PR to describe the heuristics and mark the repo as deprecated in the README?
Heuristics aside, the PII logging is no good.
Anyway, the nice thing about open source is that you are free to solve the issues you have with the module yourself
Sure, this is a great thing about OSS. Should I as somebody who doesn't even run a Magento site have to come in and fix it though? It's a coincidence that I was filtered by this code, work in the space (so I knew where to look), and found the source.
The repo is not deprecated. I would accept a PR, which adds a warning that the heuristics feature is far from perfect and should be used with care.
Short feedback from my side: The way you address the issue could be interpreted as rather rude and impolite. This is not what I like to see in the open source space.
Sorry but the way the spam heuristics work here is just plain bad. I don't even own a Magento site but hit spam detection on a site using this module. This shouldn't be used by anyone operating a site with a double digit user count.
Yes, the trust level is tweakable, but do you think an end user dives into how trust is calculated? I have my doubts.
It's uncommon but not impossible for a person to have a forename the same as their surname. I work with someone who has one. https://github.com/magento-hackathon/HoneySpam/blob/88f1ff9b11f119ec36a26881349635b36c682e2c/app/code/community/Hackathon/HoneySpam/Model/Checker.php#L60
What the f**k? How have you never seen an email address where somebody has their firstname in their email address? https://github.com/magento-hackathon/HoneySpam/blob/88f1ff9b11f119ec36a26881349635b36c682e2c/app/code/community/Hackathon/HoneySpam/Model/Checker.php#L67
Ditto. https://github.com/magento-hackathon/HoneySpam/blob/88f1ff9b11f119ec36a26881349635b36c682e2c/app/code/community/Hackathon/HoneySpam/Model/Checker.php#L73
Ok, no more long email addresses for some arbitrary reason. People with long names in their email addresses shall not pass! https://github.com/magento-hackathon/HoneySpam/blob/88f1ff9b11f119ec36a26881349635b36c682e2c/app/code/community/Hackathon/HoneySpam/Model/Checker.php#L88
Nice of you guys to add an exception for Mr. or Mrs. Herrman but how about the thousands of other English names that might match this rule? How about the thousands on top of that from around the world? https://github.com/magento-hackathon/HoneySpam/blob/88f1ff9b11f119ec36a26881349635b36c682e2c/app/code/community/Hackathon/HoneySpam/Model/Checker.php#L99
I can go on and on. My point is that arbitrary, hand written rules like this don't work. We'll never know how many people have had issues over the years because of some randomly decided calculations here, but I'm one of them.
As you say in the README, sites should use reCAPTCHA instead. Unfortunately, they're going to use this in addition because it provides a feeling of safety and they don't really understand how the code works underneath.
I suggest improving the heuristics. Testing for known spam email providers (as much as I hate email blacklists) is one thing you can check for. You could check the user agent too. Do some Googling and you'll find other ideas.
Also, you're logging PII: https://github.com/magento-hackathon/HoneySpam/blob/88f1ff9b11f119ec36a26881349635b36c682e2c/app/code/community/Hackathon/HoneySpam/Model/Checker.php#L90
Rework it or don't. But I would appreciate it if you would please inform consumers in the README about the heuristics and their usefulness or lack thereof.