robinvdvleuten / php-nntp

Client for communicating with servers throught the Network News Transfer Protocol (NNTP) protocol.
MIT License
39 stars 12 forks source link

Blank lines in the body of messages are missing #31

Closed mnapoli closed 7 years ago

mnapoli commented 7 years ago

Hi there, bumping on #28 -> I'm trying to use NNTP instead of IMAP for https://externals.io

I've got the whole thing working but I'm still stuck with a big issue: whether I get the body of messages using the ARTICLE or BODY command, all blank lines are missing (i.e. there are no blank lines).

Because of that:

I've read bits of the NNTP specs and it seems blank lines are allowed in the BODY, so there shouldn't be any issue.

Would you have any idea why I'm getting this issue? I've debugged with xdebug and it seems in the multiline response there are no empty lines received through the socket… I'm really lost since I'm not familiar with NNTP.

@Anahkiasen you've used that library for https://github.com/madewithlove/why-cant-we-have-nice-things (which by the way is a huge help, thanks for that) and it seems you don't have that problem: you also use the https://github.com/php-mime-mail-parser/php-mime-mail-parser library and it seems to parse the messages correctly. Am I missing something?

Here is an example of a response to a BODY command (click to expand) ``` Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:99668 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 83047 invoked from network); 29 Jun 2017 02:50:28 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 29 Jun 2017 02:50:28 -0000 Authentication-Results: pb1.pair.com smtp.mail=kalle.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=kalle.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.214.44 as permitted sender) X-PHP-List-Original-Sender: kalle.php@gmail.com X-Host-Fingerprint: 209.85.214.44 mail-it0-f44.google.com Received: from [209.85.214.44] ([209.85.214.44:36377] helo=mail-it0-f44.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 9F/C6-07609-1FA64595 for ; Wed, 28 Jun 2017 22:50:26 -0400 Received: by mail-it0-f44.google.com with SMTP id m68so39337669ith.1 for ; Wed, 28 Jun 2017 19:50:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=yolEbnwwc16CZ+8MEm4q8yUIIqtogjDB8+zfsnNDVOM=; b=qc+LgNNak0l4zz97gtCLNOUBF7EwYG960ai7RKbBkdCe5cvR9MIZW0oU6V/p+iJF/Z rP4vFoNhhlCphQ6b57xUPHWpdpRBXDymi7l9mRDKiwMP9kBEjc1Q+jm1enMfUWBEyZ2n CLNWgTHUB5pIusYa4G2iKa3V8G+p4+bsjV+vKxm6W2Mfkw/dZxTvqExZocZSQWA364l9 2cD7y9HweXZ/7kpFom/39MyLb+/GMdug9c+xXjSk7sB8BrAwLClsMmp98bYWoGR3yPhB Y7msD1sC/FkxZzp73G7xaojEkWNN8+wi80ElcEDhq0ascZX6Tf4xyNoYyg0i8twoKNGG o7ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=yolEbnwwc16CZ+8MEm4q8yUIIqtogjDB8+zfsnNDVOM=; b=aDn589ePcohEanNE8GQvOXv93+pqgGPTWl0E7jFB8RXL/IbY3JGgrhmKk20jJnFsQr WepLy62lFxliPLzNQtyRbCAFCwD3QeTX5TbbSi1kZL4Eq2kqBlAL/JYwWFrUqsZJywJG UHpInqZqIvdzVaD0widWxgwnZDJ+PJp7UDR1naFg8/VWbDDr69LX6bej86NXBjh4C7me VIk26qlb6D9ipsJxkRxHx6kQ2EYC2D2ZD0M3GXVknU7S94+S8fsAXvvMAxnCtFV8Sqho IYrj6BStmrnLgyssl5JjAlCNn5c3Vbp+6MpesrQE5NSE3cDuNwKoHciGEJtTochsmmvJ 5qnQ== X-Gm-Message-State: AKS2vOxle1I4KlPmdWlJjNlNmE+1QHc/jtP9XIa6pbUVxwaTpf9jsfos zDjL3OD5+o6sP86Tougp/Rmgr3ayEg== X-Received: by 10.36.74.195 with SMTP id k186mr11319335itb.63.1498704622975; Wed, 28 Jun 2017 19:50:22 -0700 (PDT) MIME-Version: 1.0 Sender: kalle.php@gmail.com Received: by 10.107.150.196 with HTTP; Wed, 28 Jun 2017 19:50:22 -0700 (PDT) In-Reply-To: References: Date: Thu, 29 Jun 2017 04:50:22 +0200 X-Google-Sender-Auth: -jBVE1Rz7tCTbE6rKVbwm0xtjB4 Message-ID: To: David Rodrigues Cc: PHP Internals Content-Type: text/plain; charset="UTF-8" Subject: Re: [PHP-DEV] Final variables From: kalle@php.net (Kalle Sommer Nielsen) Hi 2017-06-28 20:46 GMT+02:00 David Rodrigues : > The "final" keyworks make a "local scope" variable value "blocked to > rewrite" after instantiate it. > Okay, it sounds like a "const", and it is, but "not as we known it". I get that, but I still don't understand why you would forcefully need it to be a variable still then if you know the value is gonna be constant, of course besides global visibility or in iterations -- regards, Kalle Sommer Nielsen kalle@php.net ```

This is the same message as displayed on the official UI: http://news.php.net/php.internals/99668

capture d ecran 2017-07-01 a 16 21 22

As you can see, there are blank lines here.

There are also blank lines when fetching the email (that's what I do in the current version of externals.io) : https://externals.io/thread/980#email-15703 And here is the source of the email: https://externals.io/email/15703/source

So it seems to be an issue very specific to NNTP…

Anahkiasen commented 7 years ago

@mnapoli Are you sure WCWHNT doesn't have that issue as well? I remember that in the state I left it at there were still a lot of problems in my implementation if I recall correctly.

Could it have been maybe dealt with someplace else in my codebase? I remember I had a good bunch of places dedicated solely to cleaning up messages besides third party packages

heiglandreas commented 7 years ago

I'd love to see a hex-dump of this message. As CRLF has a special meaning in NNTP and SMTP it'd be interesting to see what characters are actually sent there!

And does that problme exist with all messages or only with some?

mnapoli commented 7 years ago

Thanks for taking the time to answer!

I think I've identified the source of the problem: https://github.com/robinvdvleuten/php-nntp/blob/98553a170804c1a7cd0d9bc4a8f3e7bb34016470/src/Connection/Connection.php#L162 When debugging the code that reads from the socket I can see the empty lines are correctly read (and added to an array with the other lines), but they are dropped by the array_filter() that is done afterwards.

I cannot understand why empty lines would need to be excluded from the response. That array_filter() is here since v0.1.0 so it doesn't seem like a random choice…

@robinvdvleuten maybe you know the logic behind that?

mnapoli commented 7 years ago

By the way I confirm that by removing the array_filter() everything seems to be working perfectly, I'm starting to import more emails (I was testing with the last 5) and I'll see how it goes.

heiglandreas commented 7 years ago

Does the test suite still run without issues after removing the array_filter?

mnapoli commented 7 years ago

I've opened #32 let's see

robinvdvleuten commented 7 years ago

Sorry guys, was offline for a couple of days. Most of the core functionality is borrowed from the original Net_NTTP pecl package, which I've optimised / modernised since then. Some of this "modernisation" is based on using the client with nzb newsgroups and a couple of other nzb related open-source projects.

I did some effort to go through the official specs and match the test cases where possible, but that definitely could get some more attention.

The array_filter is probably an artifact of one of the above. Of course this can be removed if this makes the library more spec complaint 👍 Thanks @mnapoli for opening a PR for this, let's move the discussing of the blank lines to that as well!