Closed transcendair closed 4 months ago
This sounds very similar to issue #991
I haven't been able to figure out the issue without a sample mailbox. Likely what it means is that there is a buffering issue somewhere.
The other user tried out the ExperimentalMimeParser and discovered that worked fine (it's a re-design of the current MimeParser that I had meant to swap in for v4.0 but forgot, so it's slated for v5.0 instead).
Yep. Sorted. Time stayed the same for the first file (1:11) and it did the 32g file in :54 with 147,448 messages. Hot stuff.
Marking this as a duplicate of issue #991
I have two google takeout files of size 45g and 32g. The 45g file is successfully fully parsed by mimekit and I'm able to build an index file from it (74,050 messages in 1:10 - wicked fast!). The 32g file gets to stream position 2,418,540,487 (after parsing 16,158 messages) and throws a Format Exception with message Failed to parse message headers and stack trace:
at MimeKit.MimeParser.ParseMessage(Byte* inbuf, CancellationToken cancellationToken) in D:\src\MimeKit\MimeKit\MimeParser.cs:line 1923 at MimeKit.MimeParser.ParseMessage(CancellationToken cancellationToken) in D:\src\MimeKit\MimeKit\MimeParser.cs:line 2016 at UserQuery.
The code is running in Linqpad with version 4.7, .net 8 on Windows 11. If the offending message is pulled out into a file by itself it parses fine. The code is simple:
...
The byte count where it fails is suspicious. Confusion rears its ugly head because it did just fine on the larger file. I would appreciate any pointers on how to add instrumentation to the code to see more details on the mode of failure.