purebred-mua / purebred

A terminal based mail user agent based on notmuch
GNU Affero General Public License v3.0
139 stars 19 forks source link

fd leaks #22

Closed frasertweedale closed 7 years ago

frasertweedale commented 7 years ago

Programs using hs-notmuch can misbehave due to file descriptor leaks (EMFILE).

Forcing major GC via System.Mem.performMajorGC at regular intervals avoids the issue, so technically there is no leak - the things that could be getting cleaned up are unreachable. But for some reason they are not getting cleaned up soon enough and enough FDs are "leaking" to cause problems.

Ideally the problem should be resolved in hs-notmuch, though at this stage I have no idea how.

If a workaround is truly required, we can monitor the number of open FDs and force a major GC when it approaches the limit for the process. (Yes, horrible!)

From the perspective of hs-notmuch, there is no way to absolutely avoid this because it depends what the program does with the messages.

frasertweedale commented 7 years ago

After further analysis, it comes down to GHC's generational GC and some objects for whatever reason being moved to the "older" generation. The heap never grows enough for a major GC (which addresses the older generation) so the objects live forever.

In my mailing list stats program, doing performMajorGC after each Query is sufficient to avoid the problem completely. Comparing the costs:

Without performMajorGC:

     110,095,600 bytes allocated in the heap
       1,808,752 bytes copied during GC
         189,000 bytes maximum residency (2 sample(s))
          29,224 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       210 colls,     0 par    0.047s   0.047s     0.0002s    0.0010s
  Gen  1         2 colls,     0 par    0.035s   0.035s     0.0175s    0.0343s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    3.587s  (  3.597s elapsed)
  GC      time    0.082s  (  0.082s elapsed)
  EXIT    time    0.034s  (  0.034s elapsed)
  Total   time    3.715s  (  3.714s elapsed)

  %GC     time       2.2%  (2.2% elapsed)

  Alloc rate    30,694,616 bytes per MUT second

  Productivity  97.8% of total user, 97.8% of total elapsed

And with performMajorGC:

     122,640,648 bytes allocated in the heap
       2,237,352 bytes copied during GC
          72,368 bytes maximum residency (152 sample(s))
          22,000 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       161 colls,     0 par    0.063s   0.063s     0.0004s    0.0009s
  Gen  1       152 colls,     0 par    0.090s   0.091s     0.0006s    0.0031s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    4.021s  (  4.034s elapsed)
  GC      time    0.153s  (  0.154s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    4.181s  (  4.188s elapsed)

  %GC     time       3.7%  (3.7% elapsed)

  Alloc rate    30,500,197 bytes per MUT second

  Productivity  96.3% of total user, 96.3% of total elapsed

As can be seen, in this program there is no significant performance hit to forcing the major GC. In a larger program with many longer-living objects the cost may be higher. The performMajorGC version allocates more data (because it avoids EMFILE the program has more work to do; no surprise there) but has lower memory residency (no surprise).

I think that the way forward for purebred is to ensure we have a system (or systems) of ensuring performMajorGC is run at appropriate times to ensure unreachable Message objects get GC'd. Let's keep this ticket open while we work on the program to ensure we do not forget about it, because it's too early to tell exactly when and where we will need to do this.

For hs-notmuch I'll include an admonition in the docs to explain that this silliness may be required depending on the type of program.

Finally, if it turns out there's a way to instruct the Haskell RTS to do major GC when such-and-such occurs, we should try to use that. Investigation for another day.

frasertweedale commented 7 years ago

Running with +RTS -G1 results in a single-generation two-space collector, and has the correct behaviour and good performance for the mlstats program:

     122,640,760 bytes allocated in the heap
       3,277,840 bytes copied during GC
          92,464 bytes maximum residency (236 sample(s))
          20,056 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       236 colls,     0 par    0.137s   0.138s     0.0006s    0.0013s

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    3.905s  (  3.917s elapsed)
  GC      time    0.137s  (  0.138s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time    4.048s  (  4.055s elapsed)

  %GC     time       3.4%  (3.4% elapsed)

  Alloc rate    31,404,944 bytes per MUT second

  Productivity  96.6% of total user, 96.6% of total elapsed
frasertweedale commented 7 years ago

It has been decided to avoid the notmuch functions that open FDs, since we're going to want to parse the mail in Haskell anyway. We will have enough control to promptly finalise (i.e. close) open FDs in that scenario, so the problem should not occur and we can still use generational GC.

Keeping this open until we have a HACKING readme that explains not to use the notmuch functions that can open FDs behind the scenes.