papnkukn / eml-format

RFC 822 EML file format parser and builder
MIT License
88 stars 53 forks source link

multi-part MIME not correctly handled #4

Closed noktilux closed 4 years ago

noktilux commented 6 years ago

here is a message containing a small attached jpeg and a little message ("please see attached"). your parser is only picking up the attached jpeg and not getting the little message:

http://qstatistic.com/debug/sample_email.txt

i do not get the text with either the "read" or "parse" function.

noktilux commented 6 years ago

i have had a look at the code and the issue is in this line:

if (lines[i - 1] == "" && line.indexOf("--" + findBoundary) == 0 && !/\-\-(\r?\n)?$/g.test(line)) {

the first bit -- looking for empty string -- is not valid in the test message i linked to above.

line[i-1] consists of "This is a multi-part message in MIME format."

i don't understand the logic of looking for the empty string -- why is finding the boundary string not enough here?

noktilux commented 6 years ago

can somebody please say if this issue report has been seen?

hi2u commented 6 years ago

I was also having problems parsing over 50% of my emails due to some mime header content I think. Not sure if your issue is related, but here's what worked for me...

Didn't work:

emlformat.read(eml, { headersOnly: true }, function(error, data) {...}

The error I was getting was:

TypeError: Cannot read property 'length' of undefined
    at _read (node_modules/eml-format/lib/eml-format.js:466:39)
    at node_modules/eml-format/lib/eml-format.js:518:7
    at Object.emlformat.parse (node_modules/eml-format/lib/eml-format.js:554:5)
    at Object.emlformat.read (node_modules/eml-format/lib/eml-format.js:516:15)

Did work:

I just changed the headersOnly option to false and 100% of my emails were parsed...

emlformat.read(eml, { headersOnly: false }, function(error, data)
papnkukn commented 4 years ago

@noktilux thanks for providing the example.

That is correct, the issue is in the condition that strictly requires an empty line before the multi-part boundary marker

if (lines[i - 1] == "" && line.indexOf("--" + findBoundary) == 0 && !/\-\-(\r?\n)?$/g.test(line)) {

Solved by removing the lines[i - 1] == "" ("previous line should be blank") condition from the if statement.

Issues has been fixed with version 0.6.0.

papnkukn commented 4 years ago

Just to provide an example.

So if the EML looks like this, i.e. with no new line after This is a multi-part message in MIME format.

....
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="------------194F0B6C07FF2414138ED9B2"
Content-Language: en-US

This is a multi-part message in MIME format.
--------------194F0B6C07FF2414138ED9B2
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

please see attached

--------------194F0B6C07FF2414138ED9B2
Content-Type: image/jpeg;
 name="tired_boot.FJ010019.jpeg"
...

The eml-format should now read it as

{
  "date": "2018-04-29T18:05:09.000Z",
  ...
  "text": "please see attached\r\n\r\n",
  "attachments": [
    {
      "name": "tired_boot.FJ010019.jpeg", 
      ...
}

with the text property.