zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
442 stars 56 forks source link

IPv6 is not parsing through getFromAddress() or getFromName() #180

Closed sirkokoenig closed 2 months ago

sirkokoenig commented 2 years ago

Hello,

when I have an email header with IPv6 address like Received: from AM5EUR03FT013.eop-EUR03.prod.protection.outlook.com (2603:10a6:208:15:cafe::d2) by AM0PR10CA0076.outlook.office365.com (2603:10a6:208:15::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4566.16 via Frontend Transport; Thu, 30 Sep 2021 11:40:15 +0000 I can´t get the IPv6 address from getFromAddress() or getFromName()

Is there a bug or something missing in the source code or do I have a failure somewhere?

zbateson commented 2 years ago

Off the top of my head, I thought I was parsing for both IPv6 and 4, but could very well be a bug.

The code basically fails if anything doesn't match correctly for the format it expects, it's pretty rigid to the standard in that way I think.

I'll have to investigate more closely when I have a chance if you/someone seeing this doesn't get to it first.

sirkokoenig commented 2 years ago

Would be nice if someone can figure out the problem

zbateson commented 2 years ago

Hi @sirkokoenig --

After looking into this, the reason is because your example doesn't follow RFC5321's 'address-literal' spec.

address-literal  = "[" ( IPv4-address-literal /
                    IPv6-address-literal /
                    General-address-literal ) "]"

Namely, the IPv6 address literal isn't surrounded by square brackets. Unfortunately my implementation of the Received header is only interested in following the RFC specs, the reason being there are just too many variations in the wild to make Received stable and be part of mail-mime-parser.

I've proposed elsewhere that a plugin could be made to extend MMP's Received parser to try and chase better parsing for this header. The ability to extend MMP classes is now built-in to 2.0, so this could be done. I'd happy to be part of it, but am not sure I want to head that project myself (not really interesting work to me, would be lots of issues and pull requests I imagine to try and add support for a bunch of random Received headers found).

For your current issue, if you have specific requirements, I recommend looking at the returned parts and parsing them yourself in this case, or taking the raw value and splitting it, etc...

sirkokoenig commented 2 years ago

Hi @sirkokoenig --

After looking into this, the reason is because your example doesn't follow RFC5321's 'address-literal' spec.

address-literal  = "[" ( IPv4-address-literal /
                    IPv6-address-literal /
                    General-address-literal ) "]"

Namely, the IPv6 address literal isn't surrounded by square brackets. Unfortunately my implementation of the Received header is only interested in following the RFC specs, the reason being there are just too many variations in the wild to make Received stable and be part of mail-mime-parser.

I've proposed elsewhere that a plugin could be made to extend MMP's Received parser to try and chase better parsing for this header. The ability to extend MMP classes is now built-in to 2.0, so this could be done. I'd happy to be part of it, but am not sure I want to head that project myself (not really interesting work to me, would be lots of issues and pull requests I imagine to try and add support for a bunch of random Received headers found).

For your current issue, if you have specific requirements, I recommend looking at the returned parts and parsing them yourself in this case, or taking the raw value and splitting it, etc...

Thanks for help and that tip - got it.