rolkey / indyproject

Automatically exported from code.google.com/p/indyproject
0 stars 0 forks source link

TIdMessage.LoadFrom...() wastes a lot of memory for big emails when TIdMessage.NoDecode is True. #218

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
When a large email, say 150MB, is loaded via TIdMessage.LoadFrom...() with the 
TIdMessage.NoDecode property set to True, a lot of memory is wasted, up to 3-4x 
the size of the source data, before LoadFrom...() exits. This can cause 
EOutOfMemory errors.

Internally, when TIdMessageClient.ProcessMessage() captures the source data 
into the TIdMessage.Body property, it first captures the data into a 
TMemoryStream. That is a complete copy of the email in memory.  It then passes 
that to TStrings.LoadFromStream(), which copies it into a TBytes (another full 
copy in memory) and decodes it to a String (another full copy in memory, twice 
as many bytes under D2009 because of Unicode).  So that is a 300-400% growth in 
memory usage from the source data before TStrings even sees the data it then 
parses into individual lines.

The bulk of the overhead is in that initial capture to TMemoryStream.  That is 
being done to decode the raw email bytes by its Content-Transfer-Encoding 
before then decoding the resulting bytes using the email's charset.  That needs 
to be re-written to perform line-by-line decoding instead in reduce overhead to 
a mimimum.

Original issue reported on code.google.com by gambit47 on 20 Mar 2012 at 1:39