silverstripe-archive / silverstripe-mollom

Mollom spam protection module for SilverStripe CMS
http://silverstripe.org/mollom-module
7 stars 16 forks source link

Comments marked as 'spam' are regularly being accepted. #23

Open purplespider opened 10 years ago

purplespider commented 10 years ago

For several months now my blog has been receiving spam comments, despite being protected the Mollom module.

After some initial debugging I discovered that Mollom was correctly detecting the comments as ‘spam’ but they were still being posted to the site.

I initially believed the problem was due to some incorrectly formatted responses from the Mollom servers that the Mollom PHP script and SilverStripe Mollom module is unable to read correctly.

So I sent this info to Mollom to investigate. (But see their response below)

On line 1186 of Mollom.class.inc ( https://github.com/Mollom/MollomPHP/blob/master/mollom.class.inc#L1186 ) I inserted a mail() statement to obtain the contents of the $result array whenever a comment was submitted.

I put Mollom in testing mode, and made a test ‘spam’ comment, the comment was correctly marked as spam and the $request array correctly looked like this:

Array
(
    [code] => 200
    [content] => Array
    (
        [id] => TEST1hhwbvgv0zmv61
        [spamScore] => 1.0
        [reason] => some secret reason
        [postBody] => spam
        [authorUrl] =>
        [spamClassification] => spam
    )
)

I then turned off testing mode, and waited for the next spam comment to come in. When it did, the array was empty, and the spam comment was incorrectly accepted:

Array
(
)

With some further mail() statements throughout Mollom.inc.php and the SilverStripe Mollom Module. I managed to obtain more details about the response that came back from Mollom.

For my test spam comment (with test mode on) the response looked like this:

Array
(
[code] => 200
[message] => HTTP/1.1 200 OK
[headers] => Array
    (
        [x-powered-by] => Servlet/3.0 JSP/2.2 (GlassFish Server Open
        Source Edition 3.1.2.2 Java/Oracle Corporation/1.7)
        [server] => GlassFish Server Open Source Edition 3.1.2.2
        [content-type] => application/xml
        [content-length] => 306
        [date] => Thu, 08 May 2014 12:04:17 GMT
    )

[body] => 200TEST1hhwbvgv0zmv611.0some
secret
reasonspamspam
)

But for the real (with test mode off) spam comment (ID: 140509784898bf6cf8), the response looked like this:

Array
(
    [code] => 200
    [message] => HTTP/1.1 100 Continue
    [headers] => Array
        (
        )

    [body] => HTTP/1.1 200 OK
    X-Powered-By: Servlet/3.0 JSP/2.2 (GlassFish Server Open Source Edition
    3.1.2.2 Java/Oracle Corporation/1.7)
    Server: GlassFish Server Open Source Edition 3.1.2.2
    Content-Type: application/xml
    Content-Length: 324
    Date: Fri, 09 May 2014 11:17:15 GMT

    200

        140509784898bf6cf8
        1.0
        0.7419480019999999
        spam

)

As you can see, in that last response, the ‘body’ seems to contain the header information, and the ‘headers’ key is empty. Whereas when testing mode was turned on, the ‘body’ just contained information about the spam request and the headers were correctly in the ‘headers’ key.

I believed it is the presence of these headers in the ‘body’ of the response whenever a comment is detected as spam that is causing the Mollom modules to be unable to read the response, and therefore it isn’t marking the spam comments as spam.

However Mollom support responded with the following:

I'm not seeing the same type of information in the dump that I make from a non-SilverStripe site. This looks to me like something in the SilverStripe implementation of the "request" method: https://github.com/Mollom/MollomPHP/blob/master/mollom.class.inc#L705 The Mollom library requires each implementation to handle their own processing of the request. During the processing, the headers and response are returned to the Mollom handleRequest function for further processing (which eventually returns the request to checkContent and other specific processing functions via the query method).

See query: https://github.com/Mollom/MollomPHP/blob/master/mollom.class.inc#L332 See checkContent call: https://github.com/Mollom/MollomPHP/blob/master/mollom.class.inc#L437

I would suggest bringing your findings to the Silverstripe maintainers and referencing their function here: https://github.com/silverstripe/silverstripe-mollom/blob/master/code/MollomSpamProtector.php#L93

I'm happy to answer any specific questions that may come up while addressing this, but this is going to be something that needs to be addressed within their implementation. Please feel free to re-open this ticket if you need more help with this issue.

So they are saying it's a problem with the SilverStripe Mollom module.

Any idea's what's causing this?

purplespider commented 10 years ago

Think I finally got to the bottom of this.

The response from the Mollom API for the comments that were incorrectly accepted despite being marked as spam all started with HTTP/1.1 100 Continue followed by the HTTP/1.1 200 OK.

It appears it was this unexpected 100 code the caused the response to be incorrectly parsed by the SilverStripe Mollom module.

In PR #24 I've added an Expect header to the request that stops Mollom from returning these occasional 100's in the response. Been running it for a couple of weeks and not had a single spam comment accepted!