How to stream MIME into mail-parser?

stalwartlabs / mail-parser

Fast and robust e-mail parsing library for Rust

https://docs.rs/mail-parser/

Apache License 2.0

298 stars 40 forks source link

How to stream MIME into mail-parser? #76

Closed arifd closed 9 months ago

arifd commented 9 months ago

Hello, and thank you for this fantastic crate!

I have a concern though,MessageParser can only accept an in-memory collection of bytes, rather than for example Read, since MIME is itself a multipart format, and emails could (thereotically) have infinite amounts or sized attachments etc, is there something obvious that I am missing?

mdecimus commented 9 months ago

Hi,

This crate parses messages in memory and does not provide a streaming parser interface. Although messages could have an unlimited number of MIME parts, it is advisable to enforce limits on message sizes and nested MIME parts to avoid denial of service attacks. For example, Gmail does not allow message sizes over 25MB.

arifd commented 9 months ago

Ah right. that is a good point!

I am desiging a system that should be able to parse emails from various sources, I want to follow the Robustness principle which in short, says:

be conservative in what you send, be liberal in what you accept

For that reason, I was thinking about "what if someone gives me an infinite sized EML". Maybe this is an irrational thought? And if most/all email servers put limitations on the sizes that can be transferred, then I should just also, and call it a day!

mdecimus commented 9 months ago

But how would someone generate an infinite sized EML? The number of parts in an EML will always be limited by its file size which is finite. As long as the EML fits in memory, mail-parser can parse any number of parts included in it.

However, as I mentioned before, it is wise to limit the maximum size of messages and parts you accept in order to avoid attacks. Practically all mail server providers impose limits on message sizes and MIME parts.

arifd commented 9 months ago

Yes. I was exagerating for the sake of being terse (but perhaps I shouldn't have responded while I was in a hurry). Really what I was getting at was a concern for memory exhaustion since I may process multiple EMLs in prallel, but again, I think you're right here, this is a problem about knowing your limits, not something a library can make magically go away. In good systems design, nothing should be limitless I guess!

Thanks for the back and forth!