Closed frasertweedale closed 7 years ago
After further analysis, it comes down to GHC's generational GC and some objects for whatever reason being moved to the "older" generation. The heap never grows enough for a major GC (which addresses the older generation) so the objects live forever.
In my mailing list stats program, doing performMajorGC
after each Query is sufficient to avoid
the problem completely. Comparing the costs:
Without performMajorGC
:
110,095,600 bytes allocated in the heap
1,808,752 bytes copied during GC
189,000 bytes maximum residency (2 sample(s))
29,224 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 210 colls, 0 par 0.047s 0.047s 0.0002s 0.0010s
Gen 1 2 colls, 0 par 0.035s 0.035s 0.0175s 0.0343s
INIT time 0.000s ( 0.000s elapsed)
MUT time 3.587s ( 3.597s elapsed)
GC time 0.082s ( 0.082s elapsed)
EXIT time 0.034s ( 0.034s elapsed)
Total time 3.715s ( 3.714s elapsed)
%GC time 2.2% (2.2% elapsed)
Alloc rate 30,694,616 bytes per MUT second
Productivity 97.8% of total user, 97.8% of total elapsed
And with performMajorGC
:
122,640,648 bytes allocated in the heap
2,237,352 bytes copied during GC
72,368 bytes maximum residency (152 sample(s))
22,000 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 161 colls, 0 par 0.063s 0.063s 0.0004s 0.0009s
Gen 1 152 colls, 0 par 0.090s 0.091s 0.0006s 0.0031s
INIT time 0.000s ( 0.000s elapsed)
MUT time 4.021s ( 4.034s elapsed)
GC time 0.153s ( 0.154s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 4.181s ( 4.188s elapsed)
%GC time 3.7% (3.7% elapsed)
Alloc rate 30,500,197 bytes per MUT second
Productivity 96.3% of total user, 96.3% of total elapsed
As can be seen, in this program there is no significant performance hit to forcing the
major GC. In a larger program with many longer-living objects the cost may be higher.
The performMajorGC
version allocates more data (because it avoids EMFILE
the program
has more work to do; no surprise there) but has lower memory residency (no surprise).
I think that the way forward for purebred is to ensure we have a system (or systems) of ensuring
performMajorGC
is run at appropriate times to ensure unreachable Message
objects get GC'd.
Let's keep this ticket open while we work on the program to ensure we do not forget about it, because
it's too early to tell exactly when and where we will need to do this.
For hs-notmuch I'll include an admonition in the docs to explain that this silliness may be required depending on the type of program.
Finally, if it turns out there's a way to instruct the Haskell RTS to do major GC when such-and-such occurs, we should try to use that. Investigation for another day.
Running with +RTS -G1
results in a single-generation two-space collector,
and has the correct behaviour and good performance for the mlstats program:
122,640,760 bytes allocated in the heap
3,277,840 bytes copied during GC
92,464 bytes maximum residency (236 sample(s))
20,056 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 236 colls, 0 par 0.137s 0.138s 0.0006s 0.0013s
INIT time 0.000s ( 0.000s elapsed)
MUT time 3.905s ( 3.917s elapsed)
GC time 0.137s ( 0.138s elapsed)
EXIT time 0.000s ( 0.000s elapsed)
Total time 4.048s ( 4.055s elapsed)
%GC time 3.4% (3.4% elapsed)
Alloc rate 31,404,944 bytes per MUT second
Productivity 96.6% of total user, 96.6% of total elapsed
It has been decided to avoid the notmuch functions that open FDs, since we're going to want to parse the mail in Haskell anyway. We will have enough control to promptly finalise (i.e. close) open FDs in that scenario, so the problem should not occur and we can still use generational GC.
Keeping this open until we have a HACKING readme that explains not to use the notmuch functions that can open FDs behind the scenes.
Programs using hs-notmuch can misbehave due to file descriptor leaks (EMFILE).
Forcing major GC via
System.Mem.performMajorGC
at regular intervals avoids the issue, so technically there is no leak - the things that could be getting cleaned up are unreachable. But for some reason they are not getting cleaned up soon enough and enough FDs are "leaking" to cause problems.Ideally the problem should be resolved in hs-notmuch, though at this stage I have no idea how.
If a workaround is truly required, we can monitor the number of open FDs and force a major GC when it approaches the limit for the process. (Yes, horrible!)
From the perspective of hs-notmuch, there is no way to absolutely avoid this because it depends what the program does with the messages.