pluginkollektiv / antispam-bee

„... another popular solution to fight spam is Antispam Bee“ – Matt Mullenweg, Q&A WordCamp Europe 2014
https://wordpress.org/plugins/antispam-bee/
GNU General Public License v2.0
163 stars 27 forks source link

Collect spam data in a smart way #73

Open schlessera opened 7 years ago

schlessera commented 7 years ago

Anonymously collect non-detected spam comments.

What data to collect:

Comments that were not detected as spam and for which the site user manually clicked the "Spam" button.

When to collect:

When the site user first clicks this "Spam" button, we should ask the permission to anonymously send the comment data to a centralized database, in order to improve Antispam Bee.

How to collect:

At first, send to a HTTPS endpoint that stores everything in a simple database (probably NoSQL). We may need to evaluate a more scalable solution in the future. The collected data must not contain any mention of the sender or information about their user or system. It should contain as much information as possible about the actual spam content and where it originated.

timse201 commented 7 years ago

are we or the user allowed to do that? are there some copyright/law issues? and if we do it everytime there could be some false positives because someone dislikes an user and marks them as spam or its only because someone posted a comment several times by misklicking or caching issues etc.

but i agree if we are allowed to (no law issues) then we should make it simpler to submit spam

websupporter commented 7 years ago

I think, its a great idea. We should also include false positives.

I do not really see legal issues. In my understanding, if someone posts a comment, he gives the website owner the right to publish it. But honestly, I do not know how far this right can be stretched.

there could be some false positives

Yes, but right now, we have the same issue with our Google document. I think its worth a shot.

There should be an option in the settings like (send always, never send), maybe instead but as a addition to the question "do you want to send this specific comment?" to guarantee a quicker work flow.

schlessera commented 7 years ago

An alternative would be to add a separate button besides the Spam & Trash buttons. Something like Send for Analysis or similar. If they just want to get rid of their uninteresting newsletters, they will probably not click on Send for Analysis for these...

schlessera commented 7 years ago

And, yes, the original idea was to ask for permission once on clicking Spam and then have this be the new default.

Zodiac1978 commented 4 years ago

We could use the transformation action hooks comment_unapproved_to_spam and comment_approved_to_spam or we could provide a button / action link for this.

Possible problems: Privacy concerns (IP, Mail, Content, etc. from Comments) are submitted to us (or a Third-Party-Service like Google Forms).

This feature needs consent from the user: https://developer.wordpress.org/plugins/wordpress-org/detailed-plugin-guidelines/#7-plugins-may-not-track-users-without-their-consent

krafit commented 4 years ago

In my opinion the best way to collect non-detected spam would be to add a link alongside “Mark as spam” — something like “report to Antispam Bee”. When a user clicks that link, they'll have to confirm that they are about to disclose the comment and its metadata to the ASB team for further investigation and to improve ASBs filters before its sent.

Bildschirmfoto 2020-04-12 um 12 44 32
Zodiac1978 commented 4 years ago

To get even more data, we could use the action hooks if someone marks a comment as spam and then ask for the data (like PoEdit does this):

Bildschirmfoto 2020-04-12 um 12 49 22

With an opportunity to opt-in to have this as the default.

krafit commented 4 years ago

I thought about an opt-in, but I didn't like the privacy implications of having this as a default for everyone after someone opted-in. But we could handle the opt-in the way PoEdit does, by handling it on a per user basis. This way every user has the opportunity to give informed consent before sharing data (for the first time).

Zodiac1978 commented 4 years ago

If we stay with our workflow (using the Google Form) we could pre-fill the form like this:

https://docs.google.com/forms/d/e/1FAIpQLSeQlKVZZYsF1qkKz7U78B2wy_6s6I7aNSdQc-DGpjeqWx70-A/viewform?c=0&w=1&entry.437446945=name%20of%20the%20commenter&entry.462884433=IP&entry.1346967038=Host&entry.121560485=email%20of%20the%20commenter&entry.1210529682=website%20of%20the%20commenter&entry.1837399577=content%20of%20the%20comment

URL encoded data.

The user just needs to hit the "Send" button at the end of the page.

Zodiac1978 commented 4 years ago

If someone wants to test this feature: Here is a working addon plugin:

<?php
/**
 * Plugin Name: Report Spam
 * Description: Addon for Antispam Bee to report spam.
 * Plugin URI:  https://torstenlandsiedel.de
 * Version:     1.0
 * Author:      Torsten Landsiedel
 * Author URI:  http://torstenlandsiedel.de
 * Licence:     GPL 2
 * License URI: http://opensource.org/licenses/GPL-2.0
 */

if ( ! defined( 'ABSPATH' ) ) {
    exit; // Exit if accessed directly.
}

/**
 * Add comment action link to report spam to ASB
 *
 * @param array   $actions Array of actions.
 * @param comment $comment Comment object.
 */
function add_report_comment_action_link( $actions, $comment ) {

    // URLencode comment data.
    $name    = rawurlencode( $comment->comment_author );
    $email   = rawurlencode( $comment->comment_author_email );
    $ip      = rawurlencode( $comment->comment_author_IP );
    $host    = rawurlencode( gethostbyaddr( $ip ) );
    $url     = rawurlencode( $comment->comment_author_url );
    $content = rawurlencode( $comment->comment_content );
    $agent   = rawurlencode( $comment->comment_agent );

    // Build action link.
    $target = ' target="_blank" ';
    $rel    = ' rel="noopener noreferrer" ';
    $href   = 'href="https://docs.google.com/forms/d/e/1FAIpQLSeQlKVZZYsF1qkKz7U78B2wy_6s6I7aNSdQc-DGpjeqWx70-A/viewform?c=0&w=1&entry.437446945=' . $name . '&entry.462884433=' . $ip . '&entry.1346967038=' . $host . '&entry.121560485=' . $email . '&entry.1210529682=' . $url . '&entry.1837399577=' . $content . '&entry.372858475=' . $agent . '" ';

    $action  = '';
    $action .= "<a $target $href $rel>";
    $action .= __( 'Report to Antispam Bee', 'antispam-bee' );
    $action .= '</a>';

    $actions['report_spam trash'] = $action;

    return $actions;
}
add_filter( 'comment_row_actions', 'add_report_comment_action_link', 10, 2 );
Zodiac1978 commented 4 years ago

Bildschirmfoto 2020-07-22 um 23 21 17

Zodiac1978 commented 4 years ago

Includes Comment User Agent as a new item (form is already extended for this) and it gets the host from the IP.

Zodiac1978 commented 4 years ago

there could be some false positives

We could add a checkbox at the end of the form "o This is a false positive and no spam" which could be checked before sending the form. Although I don't think many people would use it ...

stkjj commented 3 years ago

With regard to https://torstenlandsiedel.de/2021/01/31/antispam-bee-braucht-eure-juristische-hilfe/:

a) self hosted instead of google for sure (or at least a SaaS based within EU and proper data processing contract) b) if consent is given by the submitter, everything is fine. Can the consent be withdrawn? Legally yes, factually no: once it's worked with, we of course could remove the data from the list of submittance, yet the evidence out of the case remains. At least as long as the submittance is taken care of in a timely manner ;-). c) regarding the entity receiving: Indeed the biggest flaw as we are acting as a GbR which includes the chance that any random member of the GbR could be sued, fined, … This is the point where a discussion about changing the legal framework for the entity should take place. To be focused on the matter, I'ld suggest to seperate this from this issue. Happy to start this indeed internal discussion on our slack channel.

to get hands-on: The link "Report to Antispam Bee" should ideally give a modal with all neccessary information* e.g. which data is submitted, where it will be stored an for which amount of time, who will have access to it and how it will be purged as well as a note that the data is provided on a consensual base. At last each a confirm / decline button which than submits the data to a GDPR compliant server for further processing.

*let me draft something later this week

stkjj commented 3 years ago

For further discussion a text for the modal (de/en):

Vielen Dank dass Du uns hilfst Antispam Bee besser zu machen.

Du bist gerade dabei den Kommentar von [Name des Kommentators] mit dem Inhalt [Inhalt des Kommentars] an uns zu melden, da Du es für nicht erkannten Spam hälst. Folgende Daten haben wir außerdem in dem Kommentar gefunden, die wir für die Auswertung und die Heuristik von Antispam Bee verwerten werden:

Wir werten diese Daten [automatisiert|manuell] aus um damit die Spamerkennung von Antispam Bee zu verbessern. Sofern wir mehrfach gleichlautende Meldungen über einen Spamer bekommen, nutzen wir diese Daten auch um damit Blacklist Updater zu aktualisieren. Die Daten werden von uns in den nächsten x [Stunden|Tagen] verarbeitet und danach automatisch gelöscht. Für den Zeitraum der Verarbeitung werden die Daten ausschliesslich auf Servern mit Standort Deutschland gespeichert. Lediglich das Entwicklerteam von Antispam Bee hat darauf Zugriff. Um den Prozess schlank zu halten, bekommst Du von uns keine weitere Rückmeldung über die Verarbeitung, Speicherung oder Löschung, aber unser Dank wird Dir gewiss sein.

Wenn Du mit der Übermittlung dieser Daten einverstanden bist, kannst Du sie mit dem Button unten absenden. Button: Verwerfen / Button: Absenden


Thank you for helping us to improve Antispam Bee.

You are about to report the comment by [commenter name] with the content [content of the comment] to us, because you believe it is unrecognized spam. We also found the following data in the comment, which we will exploit for Antispam Bee's evaluation and heuristics:

We evaluate this data [automated|manually] to improve the spam detection of Antispam Bee. If we receive multiple identical messages about a spammer, we also use this data to improve Blacklist Updater. The data will be processed by us in the next x [hours|days] and then automatically deleted. For the period of processing, the data is stored exclusively on servers located in Germany. Access to this data is only granted to our developer team. To keep the process lean, you will not receive any further feedback from us about the processing, storage or deletion, but pls receive our thanks for your help.

If you agree to submit this data, you can send it using the button below. Button: Discard / Button: Submit