Getting Exception Format Error with message: Failed to parse message headers

transcendair commented 4 months ago

I have two google takeout files of size 45g and 32g. The 45g file is successfully fully parsed by mimekit and I'm able to build an index file from it (74,050 messages in 1:10 - wicked fast!). The 32g file gets to stream position 2,418,540,487 (after parsing 16,158 messages) and throws a Format Exception with message Failed to parse message headers and stack trace:

at MimeKit.MimeParser.ParseMessage(Byte* inbuf, CancellationToken cancellationToken) in D:\src\MimeKit\MimeKit\MimeParser.cs:line 1923 at MimeKit.MimeParser.ParseMessage(CancellationToken cancellationToken) in D:\src\MimeKit\MimeKit\MimeParser.cs:line 2016 at UserQuery.

The code is running in Linqpad with version 4.7, .net 8 on Windows 11. If the offending message is pulled out into a file by itself it parses fine. The code is simple:

    using (var stream = File.OpenRead(fileName))
    {
        var parser = new MimeParser(stream, MimeFormat.Mbox);

        while (!parser.IsEndOfStream)
        {
            //Console.WriteLine(count++);
            try
            {
                count++;
                var message = parser.ParseMessage();

...

The byte count where it fails is suspicious. Confusion rears its ugly head because it did just fine on the larger file. I would appreciate any pointers on how to add instrumentation to the code to see more details on the mode of failure.

jstedfast commented 4 months ago

This sounds very similar to issue #991

I haven't been able to figure out the issue without a sample mailbox. Likely what it means is that there is a buffering issue somewhere.

The other user tried out the ExperimentalMimeParser and discovered that worked fine (it's a re-design of the current MimeParser that I had meant to swap in for v4.0 but forgot, so it's slated for v5.0 instead).

transcendair commented 4 months ago

Yep. Sorted. Time stayed the same for the first file (1:11) and it did the 32g file in :54 with 147,448 messages. Hot stuff.

jstedfast commented 4 months ago

Marking this as a duplicate of issue #991

jstedfast / MimeKit

Getting Exception Format Error with message: Failed to parse message headers #1053