timbod7 / haskell-chart

A 2D charting library for haskell
428 stars 85 forks source link

Excessive memory usage with 75k point data sets #31

Open argiopetech opened 10 years ago

argiopetech commented 10 years ago

Plotting 75k points results in the following run statistics:

  17,095,699,408 bytes allocated in the heap
   5,963,243,808 bytes copied during GC
     734,099,456 bytes maximum residency (20 sample(s))
       6,439,600 bytes maximum slop
            1861 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     32989 colls,     0 par    5.60s    5.61s     0.0002s    0.0014s
  Gen  1        20 colls,     0 par    6.64s    6.66s     0.3330s    1.9475s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   20.44s  ( 20.48s elapsed)
  GC      time   12.24s  ( 12.27s elapsed)
  EXIT    time    0.03s  (  0.03s elapsed)
  Total   time   32.72s  ( 32.78s elapsed)

  %GC     time      37.4%  (37.4% elapsed)

  Alloc rate    836,283,704 bytes per MUT second

  Productivity  62.6% of total user, 62.5% of total elapsed

Additional charts (e.g., for multiple charts per diagram via the (|||) or (===) operators) have cumulative memory use, such that creating four charts similar to points0 in the code uses ~90% of my laptop's 8GB memory.

Code and a detailed Time/Allocation profile are in this gist, and the file containing the simulated data is on dropbox (3.7MB). Compilation is with ghc -O3 simpleplots.hs. Chart version is 1.2.2. GHC version is 7.8.2.

I've started playing with converting Chart to use boxed Vectors in the hopes that it will improve memory residency, but I haven't yet gotten far enough for results. The ability to force evaluation of a Renderable could be an alternative if reducing this memory usage is not possible, as it would at least reduce the cumulative effect of multiple charts.

argiopetech commented 10 years ago

Further research shows that maximum residency is constant at 165MB until ~1050 records.

timbod7 commented 10 years ago

Have you tried this with the cairo backend? The diagrams backend is alot more resource hungry.

timbod7 commented 10 years ago

I tried this out. With your program as written, I get these statistics:

tims-imac:chart timd$ ./test-diagrams +RTS -s -RTS -o test.png
  16,499,877,984 bytes allocated in the heap
   5,606,916,144 bytes copied during GC
     624,099,472 bytes maximum residency (19 sample(s))
      12,850,864 bytes maximum slop
            1713 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     31853 colls,     0 par    4.52s    4.72s     0.0001s    0.0055s
  Gen  1        19 colls,     0 par    4.57s    5.95s     0.3131s    1.7521s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   21.80s  ( 22.74s elapsed)
  GC      time    9.10s  ( 10.67s elapsed)
  EXIT    time    0.00s  (  0.04s elapsed)
  Total   time   30.90s  ( 33.45s elapsed)

  %GC     time      29.4%  (31.9% elapsed)

  Alloc rate    756,996,117 bytes per MUT second

  Productivity  70.6% of total user, 65.2% of total elapsed

whereas changing it to use the cairo backend, I see:

tims-imac:chart timd$ ./test-cairo +RTS -s -RTS
   2,462,394,080 bytes allocated in the heap
     344,392,920 bytes copied during GC
      71,451,304 bytes maximum residency (9 sample(s))
       1,545,568 bytes maximum slop
             138 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4761 colls,     0 par    0.20s    0.21s     0.0000s    0.0004s
  Gen  1         9 colls,     0 par    0.16s    0.21s     0.0231s    0.0974s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.26s  (  2.76s elapsed)
  GC      time    0.36s  (  0.42s elapsed)
  EXIT    time    0.00s  (  0.01s elapsed)
  Total   time    2.62s  (  3.19s elapsed)

  %GC     time      13.8%  (13.1% elapsed)

  Alloc rate    1,091,502,219 bytes per MUT second

  Productivity  86.2% of total user, 70.9% of total elapsed

So it looks like either an issue with the binding to the diagrams library (ie the code in Graphics.Rendering.Chart.Backend.Diagrams) or with the diagrams library itself.

argiopetech commented 10 years ago

So it looks like either an issue with the binding to the diagrams library (ie the code in Graphics.Rendering.Chart.Backend.Diagrams) or with the diagrams library itself.

Indeed it does. Apologies for the false-report. I should have thought of trying the Cairo backend for Chart (rather than just the Cairo backend for Diagrams).

If you're not opposed, I'll leave this open until we can confirm this isn't an issue with Chart-Diagrams. I expect it not to be Chart-Diagram's problem, as Diagrams admits to not being performance-focused.

bergey commented 10 years ago

I appreciate real-world examples where the performance of Diagrams is a problem. Thanks for sharing your code! I just did my own profiling run. It looks like 30%--40% of the allocations are in our transformations code, so definitely worth trying some different code there. The 20% of time in StateStack is more surprising.

Full cost center report and summary: lapply, linearCombo and sumV are all part of transformations.

    Thu Jun  5 01:25 2014 Time and Allocation Profiling Report  (Final)

       charts-issue-31 +RTS -sstderr -p -RTS -o charts-issue-31.png -w 800

    total time  =       30.66 secs   (30664 ticks @ 1000 us, 1 processor)
    total alloc = 20,978,554,112 bytes  (excludes profiling overheads)

COST CENTRE        MODULE                   %time %alloc

lift               Control.Monad.StateStack  18.3    0.4
render             Diagrams.Core.Types        9.2    2.3
lapply'            Data.LinearMap             8.8   22.3
MAIN               MAIN                       7.4   11.8
transferFunction   Data.Colour.SRGB           3.9    3.9
>>=                Control.Monad.StateStack   3.8    2.1
<>                 Diagrams.Core.Style        3.6    1.5
linearCombo        Data.Basis                 3.5   10.9
runStateStackT     Control.Monad.StateStack   2.6    1.8
gmapAttrs.gmapAttr Diagrams.Core.Style        2.3    1.1
sumV               Data.AdditiveGroup         2.2    5.8
foldMap            Data.FingerTree            1.9    4.0
getAttr.ty         Diagrams.Core.Style        1.6    2.8
renderDiaT         Diagrams.Core.Compile      1.6    1.7
convert            Data.Colour.Chan           1.4    0.4
fmap               Control.Monad.StateStack   1.4    1.3
getAttr            Diagrams.Core.Style        1.3    0.1
over               Control.Newtype            1.1    1.3
>>                 Control.Monad.StateStack   1.1    1.2
transform          Diagrams.Core.Style        1.0    0.8
inStyle            Diagrams.Core.Style        0.9    1.0
mappend            Diagrams.Core.Types        0.1    1.2
bergey commented 10 years ago

I think the Transformation / MemoTrie issue was a red herring. I ran the program against a patched version of diagrams-core that doesn't use MemoTrie, and allocations went down 40% (as expected) but peak residency was basically unchanged.

On to heap profiling. This is my first attempt at memory profiling in any language, so please correct me if I get things wrong. The heap profile of the Chart code above shows memory usage increasing for the first 4 seconds, decreasing for the rest of the run. (postscript) I take it some large structure is being fully evaluated before it can be garbage collected.

For comparison, I ran the factorization diagram from our gallery. It shows pretty flat memory usage for 80% of the runtime. (postscript) This makes me suspect that Chart-diagrams is calling Diagrams in a suboptimal way. I want to take a closer look at that code, and profile the Chart program compiled against Chart-cairo.