zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
442 stars 56 forks source link

ReceivedHeader: incorrectly parse #183

Closed mariuszkrzaczkowski closed 2 years ago

mariuszkrzaczkowski commented 2 years ago

I found a strange bug, the problem is probably in the word 'id'

My code:

$message = \ZBateson\MailMimeParser\Message::from($contents, false);
foreach ($message->getAllHeadersByName('Received') as $received) {
    echo 'FromName: ' . $received->getFromName() . PHP_EOL;
    echo 'FromHostname: ' . $received->getFromHostname() . PHP_EOL;
    echo 'FromAddress: ' . $received->getFromAddress() . PHP_EOL;
    echo 'ByName: ' . $received->getByName() . PHP_EOL;
    echo 'ByHostname: ' . $received->getByHostname() . PHP_EOL;
    echo 'ByAddress: ' . $received->getByAddress() . PHP_EOL;
    echo 'with: ' . $received->getValueFor('with') . PHP_EOL;
    echo 'getComments: ' . implode(' | ', $received->getComments()) . '<hr>';
}

Example 1 (working properly) Code:

$contents = 'Received: from mail.uiii.ac.idd ([111.222.333.444])
by mail.yetiforce.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
(Exim 4.94.2)
(envelope-from <noreply@mail.uiii.ac.idd>)
id 1mrMVK-00079p-W2
for m.krzaczkowski@yetiforce.com; Sun, 28 Nov 2021 16:54:15 +0100';

Response:

FromName: mail.uiii.ac.idd
FromHostname: 
FromAddress: 111.222.333.444
ByName: 
ByHostname: 
ByAddress: 
with: 
getComments: 

Example 1 (bug) Code:

$contents = 'Received: from mail.uiii.ac.id ([111.222.333.444])
by mail.yetiforce.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
(Exim 4.94.2)
(envelope-from <noreply@mail.uiii.ac.id>)
id 1mrMVK-00079p-W2
for m.krzaczkowski@yetiforce.com; Sun, 28 Nov 2021 16:54:15 +0100';

Response:

FromName: mail.uiii.ac.
FromHostname: 
FromAddress: 
ByName: 
ByHostname: 
ByAddress: 
with: 
getComments: [111.222.333.444]

difference: mail.uiii.ac.idd VS mail.uiii.ac.id

zbateson commented 2 years ago

Aah, it's because there's a section called 'id' so it starts a sub-consumer at that point. Should be an easy enough fix to the regex here:

https://github.com/zbateson/mail-mime-parser/blob/fc0e05de6cc3a7fb449c9a5a1922a1b8b96372c7/src/Header/Consumer/Received/GenericReceivedConsumer.php#L91

In most cases the spaces before/after aren't optional except for 'from' which I think is required and always comes at the start but would have to refresh my memory on it. So it might make more sense to have a 'FromReceivedConsumer' that uses \s*(from)\s+ and the rest have \s+(token)\s+.

Lastly some better testing needs to be added.

Thanks for reporting, @mariuszkrzaczkowski

mariuszkrzaczkowski commented 2 years ago

some solution ?? it blocks my job

zbateson commented 2 years ago

some solution ?? it blocks my job

Feel free to submit a PR with a fix.

mariuszkrzaczkowski commented 2 years ago

I have no idea how to fix it and where

mariuszkrzaczkowski commented 2 years ago

if I knew it, I would have created PR a long time ago

mariuszkrzaczkowski commented 2 years ago

In most cases the spaces before/after aren't optional except for 'from' which I think is required and always comes at the start but would have to refresh my memory on it. So it might make more sense to have a 'FromReceivedConsumer' that uses \s*(from)\s+ and the rest have \s+(token)\s+.

I tried what you wrote but it didn't work

zbateson commented 2 years ago

Fixed in 2.2.0.