Open djc opened 6 years ago
Do you think you could provide the mbox that's failing to parse with sensitive information removed? Possibly a reduced one with the stuff that's failing, so I can add it as a test to start from for fixing it.
What are you looking for, exactly? I'm pretty sure the starting line I provided in my first message will already induce failure in your parser, I'm sure you could easily attach a message to it.
The mbox file I have is 7.8G so editing out sensitive info is quite a bit of work. I'm happy to run your code on it again to check for other failures if you can fix this one, and provide enough context to construct test cases from.
@djc the first 3 emails would be enough, it's not just about the first line because it also has to do with the way the emails are separated and other stuff that could be different. For instance I'm fairly sure the error comes because the From
separator is different from the one I'm waiting for, but the problem is that I can't just make it laxer because then it could match From whatever
in the email body and wrongly split the emails apart.
What I want to avoid is having to go back and forth fixing one error after the other instead of just fixing all of them at once 🐼
Okay, so https://dirkjan.ochtman.nl/files/test.mbox should give you a few messages from the head of my mbox file (that is, head -nx gmail.mbox > test.mbox
); I only culled some GMail-specific headers and part of the Received
headers. I'm not sure the end of the file is similar as the original end of the file.
BTW, not sure how aware of this you are; your choice of 8-space formatting and use of the WTFPL are a bit anomalous for the Rust ecosystem. If you'd like your code to be picked up more broadly/get more contributors, you may want to reconsider these choices.
That's perfect, I'll see if I can fix it tonight, otherwise it goes to the weekend.
Also it's tabs and not spaces, and I'm aware about the license too, just choices 🐼
Any chance this will be fixed?
@droundy you might want to try the crate I published, https://github.com/djc/mbox-reader.
@djc I looked at mbox_reader, but found the complete lack of documentation unencouraging.
@droundy if you open an issue in that project, I'll write you some.
Google Takeout email archives take the mbox format, but their format doesn't comply with the preferred scheme as laid out in RFC 4155, appendix A. I'm not sure how interoperable you intend for this crate to be? In any case, here's the first line of the file I got last week: