haskell / haddock

Haskell Documentation Tool
www.haskell.org/haddock/
BSD 2-Clause "Simplified" License
361 stars 243 forks source link

Using `Text`, `Map`, etc for JSON #1559

Closed parsonsmatt closed 2 months ago

parsonsmatt commented 1 year ago

So, aeson may be out (#1558), but what if we used Text and Map instead of String and []?

It ain't looking too good.

In fact, it looks downright weird!

Basline

time cabal haddock --haddock-options "-v2 --quickjump +RTS -s -RTS --optghc=\"-v2\" --lib=/home/matt/Projects/haddock/haddock-api/resources" persistent --with-haddock=/home/matt/.cabal/bin/haddock-943
!!! renameAllInterfaces: finished in 112.10 milliseconds, allocated 232.280 megabytes
*** ppJsonIndex:
!!! ppJsonIndex: finished in 1568.96 milliseconds, allocated 7682.663 megabytes
*** ppHtml:
*** ppHtmlContents:
!!! ppHtmlContents: finished in 0.73 milliseconds, allocated 2.298 megabytes
*** ppJsonIndex:
!!! ppJsonIndex: finished in 26.43 milliseconds, allocated 129.281 megabytes

  24,248,235,776 bytes allocated in the heap
   3,644,045,200 bytes copied during GC
     379,247,624 bytes maximum residency (16 sample(s))
       5,837,816 bytes maximum slop
             968 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5794 colls,     0 par    1.573s   1.575s     0.0003s    0.0028s
  Gen  1        16 colls,     0 par    1.255s   1.498s     0.0936s    0.3925s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    5.647s  (  6.241s elapsed)
  GC      time    2.828s  (  3.074s elapsed)
  EXIT    time    0.001s  (  0.005s elapsed)
  Total   time    8.476s  (  9.320s elapsed)

  Alloc rate    4,294,171,488 bytes per MUT second

  Productivity  66.6% of total user, 67.0% of total elapsed

Same baseline in #1558

Just with encoding improvements

The encode-to-builder stuff isn't great - it's inefficiently allocating a bunch of extra lists. The foldMap charToBuilder thing is also wasteful. Fix that up, using some of the code from aeson, and we got:

!!! ppJsonIndex: finished in 1711.11 milliseconds, allocated 7466.382 megabytes

  24,001,682,632 bytes allocated in the heap
   3,638,025,448 bytes copied during GC
     378,503,736 bytes maximum residency (16 sample(s))
       5,823,944 bytes maximum slop
             965 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5731 colls,     0 par    1.582s   1.585s     0.0003s    0.0028s
  Gen  1        16 colls,     0 par    1.184s   1.184s     0.0740s    0.2534s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    5.832s  (  6.344s elapsed)
  GC      time    2.766s  (  2.768s elapsed)
  EXIT    time    0.001s  (  0.008s elapsed)
  Total   time    8.599s  (  9.120s elapsed)

  Alloc rate    4,115,707,641 bytes per MUT second

  Productivity  67.8% of total user, 69.6% of total elapsed

Saving ~3MB total memory use, 248MB total allocations. Both JSON index generation and total runtime is slower: index generation is 1711.11ms vs 1568ms. JSON allocations are down from 7682 MB to 7466 MB, but total memory use barely moved.

Weird.

With Text instead of String

Text is more efficient than String, right? This should be a lot better.

!!! ppJsonIndex: finished in 1633.03 milliseconds, allocated 7908.335 megabytes

  24,466,968,592 bytes allocated in the heap
   3,856,359,552 bytes copied during GC
     321,971,216 bytes maximum residency (17 sample(s))
       5,727,096 bytes maximum slop
             847 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5843 colls,     0 par    1.635s   1.638s     0.0003s    0.0028s
  Gen  1        17 colls,     0 par    1.231s   1.231s     0.0724s    0.1985s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    5.753s  (  6.315s elapsed)
  GC      time    2.867s  (  2.869s elapsed)
  EXIT    time    0.001s  (  0.005s elapsed)
  Total   time    8.620s  (  9.190s elapsed)

  Alloc rate    4,253,272,960 bytes per MUT second

  Productivity  66.7% of total user, 68.7% of total elapsed

Nice - saving ~80ms on the JSON index. But allocationsare up to 7908 MB - 500MB more than String, and ~220MB more than the encoding.

The overall memory use is somehow down to 847 MB from 965 MB, despite allocating more. Curious.

List to Map

This replaces the [(Text, Value)] with Map Text Value.

!!! ppJsonIndex: finished in 1699.28 milliseconds, allocated 7925.515 megabytes

  24,485,299,144 bytes allocated in the heap
   3,855,818,800 bytes copied during GC
     320,293,448 bytes maximum residency (17 sample(s))
       5,733,944 bytes maximum slop
             842 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5847 colls,     0 par    1.656s   1.659s     0.0003s    0.0029s
  Gen  1        17 colls,     0 par    1.196s   1.197s     0.0704s    0.1980s

  TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.000s elapsed)
  MUT     time    5.778s  (  6.295s elapsed)
  GC      time    2.852s  (  2.855s elapsed)
  EXIT    time    0.001s  (  0.010s elapsed)
  Total   time    8.632s  (  9.160s elapsed)

  Alloc rate    4,237,462,312 bytes per MUT second

  Productivity  66.9% of total user, 68.7% of total elapsed

... Slightly slower still, with worse allocations.

Further Work

aeson improved JSON index generation time by 76% and allocations by 90%. These patches make things worse, though also somehow reduce overall memory use by a decent amount. What gives?

The index generation code basically just does encodeToBuilder . toJSON, which makes a big [Object], where each Object is a map of four Strings. I suspect that copying more of the Value -> Builder code into Haddock should help.

parsonsmatt commented 1 year ago

Updated the PR to use all the encoding stuff from aeson. No dice! Results are not demonstrably improved.

PR is using [(Text, Value)] for the objects. Next step, try Map Text Value, though I'd be surprised if that improved much.

I don't understand how aeson is able to encode a [JsonIndexEntry] so much faster. The Encoding logic is copied over. toEncoding @[a] shouldn't be generating an intermediate list in either case, it should be a foldr to generate the encoding directly - and toEncoding @[a] should be doing toEncoding on the underlying values, so the Object type shouldn't matter for writing the index.

parsonsmatt commented 1 year ago

Oh, sigh. ppJsonIndex renders stuff to JSON, but the first thing it does is parse all the installedIndexes.

  (errors, installedIndexes) <-
    partitionEithers
      <$> traverse
            (\ifaceFile -> do
              let indexFile = takeDirectory ifaceFile
                    FilePath.</> "doc-index.json"
              a <- doesFileExist indexFile
              if a then
                    bimap (indexFile,) (map (fixLink ifaceFile))
                <$> eitherDecodeFile @[JsonIndexEntry] indexFile
                   else
                    return (Right [])
            )
            installedIfacesPaths

That means that almost all of the benefit of aeson is the optimized attoparsec parser.

parsonsmatt commented 1 year ago

I started down the path of porting some of aeson's parser to parsec, but it's more annoying than I want to deal with, and I don't trust that an optimized Parsec parser would be worth the effort compared to attoparsec.

It'd be easier to incorporate attoparsec as a boot package, or a clone. This would also require bringing scientific in. That depends on quite a few non-boot packages - bytestring-builder, hashable, integer-logarithms. hashable is only for providing an instance, bytestring-builder is for compatibility, and integer-logarithms is actually used. integer-logarithms only depends on nats as a non-boot package, and that type Natural has been folded into base for quite some time.

The lack of high performance data structures and libraries in the boot ecosystem is going to continue to be a problem!

ulysses4ever commented 1 year ago

I was looking into performance of certain parts of cabal at some point (e.g. cabal update) and Parsec performance was often times the bottleneck (cabal "vendors" (a copy of) Parsec). So, based on this completely subjective experience, I fully expect any work done in porting to Parsec to be fruitless with regards to improving performance.

The lack of high performance data structures and libraries in the boot ecosystem is going to continue to be a problem!

So well said. We should carve this sentence on some visible place and think about it. Certainly, Cabal is in the same sad boat of ever fighting with the lack you mention.

Bodigrim commented 1 year ago

It would be quite ridiculous to have both parsec and, say, attoparsec as boot libraries. However, I think parsec is a boot library only by virtue of being employed by Cabal. So if Cabal migrates to attoparsec, haddock can do it too.

Neither scientific nor integer-logarithms nor hashable are crucial for attoparsec, so I'd suggest to split out attoparsec-core and depend on it only.