zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
442 stars 56 forks source link

Request to improve the comment retrieval feature from Received Header #157

Closed mariuszkrzaczkowski closed 1 year ago

mariuszkrzaczkowski commented 3 years ago

Referring to issues https://github.com/zbateson/mail-mime-parser/issues/152 It will be very helpful to be able to differentiate the comments from which parts they come from

an example of how it is now Array ( [0] => [104.206.174.27] helo=mail.integralstock.com [1] => Exim 4.94 [2] => envelope-from )

an example of how could be useful Array ( 'from' => [104.206.174.27] helo=mail.integralstock.com 'by' => Array ( [1] => Exim 4.94 [2] => envelope-from ) ) https://github.com/zbateson/mail-mime-parser/blob/4ebaf86ac247571d85797c5b7caf670c6a359142/src/Header/ReceivedHeader.php#L112-L113

mariuszkrzaczkowski commented 3 years ago

I think I found a place where it is created. https://github.com/zbateson/mail-mime-parser/blob/4ebaf86ac247571d85797c5b7caf670c6a359142/src/Header/Consumer/CommentConsumer.php#L106-L121

zbateson commented 3 years ago

Hi @mariuszkrzaczkowski --

In your case it would make more sense to look at the parts using 'getParts'.

The comment parts aren't necessarily 'part' of a received part, I don't think that's a guarantee, but you could look at which 'received part' came last, for example:

$parts = $receivedHeader->getParts();
$last = null;
foreach ($parts as $p) {
  if ($p instanceof 'ZBateson\MailMimeParser\Header\Part\ReceivedPart') {
    $last = $p;
  } else if ($p instanceof '...CommentPart') {
    // use $last to figure out what the last ReceivedPart was
  }
}
mariuszkrzaczkowski commented 3 years ago

This example seems to be incorrect Fatal error: Uncaught Error: Call to undefined method ZBateson\MailMimeParser\Header\ReceivedHeader::getAllParts() in

zbateson commented 3 years ago

My bad, it's just getParts. I've updated the previous comment/example as well.

mariuszkrzaczkowski commented 3 years ago

the example works, I just miss information about the part, e.g. from, by which comment is related

mariuszkrzaczkowski commented 3 years ago

you are right thanks to your example

mariuszkrzaczkowski commented 3 years ago

maybe someone will need the full code, do you think such a function would be useful in the library?

$parts = $received->getParts();
$comment = [];
$lastReceivedPart = null;
foreach ($parts as $p) {
    if ($p instanceof \ZBateson\MailMimeParser\Header\Part\ReceivedPart) {
        $lastReceivedPart = $p->getName();
    } elseif ($p instanceof \ZBateson\MailMimeParser\Header\Part\CommentPart) {
        $comment[$lastReceivedPart][] = $p->getComment();
    }
}
print_r($comment);

result

Array
(
    [id] => Array
        (
            [0] => envelope-from 
        )

)
Array
(
    [from] => Array
        (
            [0] => [104.206.174.27] helo=mail.integralstock.com
        )

    [with] => Array
        (
            [0] => Exim 4.94
            [1] => envelope-from 
        )

)
mariuszkrzaczkowski commented 3 years ago

Maybe it's worth adding features to ZBateson\MailMimeParser\Header\ReceivedHeader

public function getCommentsByType():array
{
    $comment = [];
    $last = null;
    foreach ($this->getParts() as $p) {
        if ($p instanceof \ZBateson\MailMimeParser\Header\Part\ReceivedPart) {
            $last = $p->getName();
        } elseif ($p instanceof \ZBateson\MailMimeParser\Header\Part\CommentPart) {
            $comment[$last][] = $p->getComment();
        }
    }
    return $comment;
}
zbateson commented 3 years ago

Actually it's already kind of separated and parsed in the various 'Consumer' classes already. Currently the comment portion is returned separately if it doesn't match what the consumer expects to parse into its various parts. You can see that here:

https://github.com/zbateson/mail-mime-parser/blob/674795deb8c8043746a69885b75475ce94f20925/src/Header/Consumer/Received/DomainConsumer.php#L110-L123

It returns separate parts for the comment and domain part (if the comment wasn't matched to what's expected).

zbateson commented 3 years ago

That could be changed to be both separate and part of the 'ReceivedDomainPart' maybe, so if it's not matched at least the raw value is available or something... not sure, will have to think about it a little and make sure that doesn't break something else.

mariuszkrzaczkowski commented 3 years ago

it's nice as if you can get to it from the ReceivedHeader object, i.e. to DomainConsumer which were not compliant with the rule

zbateson commented 1 year ago

I'm not sure what the status of this one is... with #153 merged and #152 fixed is this still an issue? Feel free to reopen.