An attempt to fix memory leak when cleaning up large mailboxes

webmin / usermin

Usermin source code

http://www.usermin.com/

Other

117 stars 48 forks source link

An attempt to fix memory leak when cleaning up large mailboxes #77

Closed iliajie closed 1 year ago

jcameron commented 2 years ago

I don't see how this would fix the memory leak issue?

iliajie commented 2 years ago

This is the reason why I made a PR as I wasn't sure exactly and most importantly didn't test it.

I don't see how this would fix the memory leak issue?

However, the explanation is simple. If a mail folder has a lot of emails, then emails keep populating into @delmails array as long as RAM is available. If there is a large array of mail hash refs to hold .. it will consume huge amount of RAM if emails have large attachments .. additionally, later running reverse on such potentially large array can not be resource friendly either.

We need to store within an array only a link to a mail file which needs to be deleted rather than holding the whole hasref to an email message.

jcameron commented 2 years ago

@delmails doesn't contain the entire messages though - just the headers, to avoid this kind of issue.

iliajie commented 2 years ago

@delmails doesn't contain the entire messages though - just the headers, to avoid this kind of issue.

Well, but initially when we read a mail file, do we read the whole fie? OP had a mail file of 1.9GB of size .. if we need only to get the headers, perhaps you could use my new read_file_contents_limit sub to get the head of a file only, to make sure large files, with large attachments (presumably saved using mail client, i.e. saved as a draft with attachments) won't lead to Webmin processes being killed.

jcameron commented 2 years ago

We scan through the whole file, but only store the email headers in RAM to that a 1.9 GB mailbox doesn't take up 1.9 GB.

iliajie commented 2 years ago

We scan through the whole file, but only store the email headers in RAM to that a 1.9 GB mailbox doesn't take up 1.9 GB.

I assume reading a file for extracting the headers later would consume RAM? Also, why do we scan the whole file when we presumably need no more than 100kB of its head?

jcameron commented 2 years ago

No, usermin reads the file line-by-line rather than pulling the whole thing into RAM and then parsing it. See the code here : https://github.com/webmin/webmin/blob/master/mailboxes/boxes-lib.pl#L2602

Also, we need to read the whole thing in order to parse the Date: headers and delete messages that are too old.

iliajie commented 2 years ago

No, usermin reads the file line-by-line rather than pulling the whole thing into RAM and then parsing it.

Great. Alright. I think we need to ask for that odd mail file and see what's happening wrong.

Also, we need to read the whole thing in order to parse the Date:

What do you mean by 'the whole thing'? Isn't the Date: line is in the header, somewhere at the top of the file?

jcameron commented 2 years ago

What do you mean by 'the whole thing'? Isn't the Date: line is in the header, somewhere at the top of the file?

Yes, for Maildir-format we only read each file up to the end of the headers. For mbox format we scan the entire file, but only save the headers.

swelljoe commented 2 years ago

Optimizing without understanding how the code works is not productive, Ilia. ;-)

iliajie commented 2 years ago

Yeah, I was pretty busy previous weeks. I didn't look at this code closely yet. That was just a theoretical crash. :-)