Closed parsonsmatt closed 2 months ago
Updated the PR to use all the encoding stuff from aeson
. No dice! Results are not demonstrably improved.
PR is using [(Text, Value)]
for the objects. Next step, try Map Text Value
, though I'd be surprised if that improved much.
I don't understand how aeson
is able to encode a [JsonIndexEntry]
so much faster. The Encoding
logic is copied over. toEncoding @[a]
shouldn't be generating an intermediate list in either case, it should be a foldr
to generate the encoding directly - and toEncoding @[a]
should be doing toEncoding
on the underlying values, so the Object
type shouldn't matter for writing the index.
Oh, sigh. ppJsonIndex
renders stuff to JSON, but the first thing it does is parse all the installedIndexes
.
(errors, installedIndexes) <-
partitionEithers
<$> traverse
(\ifaceFile -> do
let indexFile = takeDirectory ifaceFile
FilePath.</> "doc-index.json"
a <- doesFileExist indexFile
if a then
bimap (indexFile,) (map (fixLink ifaceFile))
<$> eitherDecodeFile @[JsonIndexEntry] indexFile
else
return (Right [])
)
installedIfacesPaths
That means that almost all of the benefit of aeson
is the optimized attoparsec
parser.
I started down the path of porting some of aeson
's parser to parsec
, but it's more annoying than I want to deal with, and I don't trust that an optimized Parsec
parser would be worth the effort compared to attoparsec
.
It'd be easier to incorporate attoparsec
as a boot package, or a clone. This would also require bringing scientific
in. That depends on quite a few non-boot
packages - bytestring-builder
, hashable
, integer-logarithms
. hashable
is only for providing an instance, bytestring-builder
is for compatibility, and integer-logarithms
is actually used. integer-logarithms
only depends on nats
as a non-boot package, and that type Natural
has been folded into base
for quite some time.
The lack of high performance data structures and libraries in the boot ecosystem is going to continue to be a problem!
I was looking into performance of certain parts of cabal at some point (e.g. cabal update
) and Parsec performance was often times the bottleneck (cabal "vendors" (a copy of) Parsec). So, based on this completely subjective experience, I fully expect any work done in porting to Parsec to be fruitless with regards to improving performance.
The lack of high performance data structures and libraries in the boot ecosystem is going to continue to be a problem!
So well said. We should carve this sentence on some visible place and think about it. Certainly, Cabal is in the same sad boat of ever fighting with the lack you mention.
It would be quite ridiculous to have both parsec
and, say, attoparsec
as boot libraries. However, I think parsec
is a boot library only by virtue of being employed by Cabal
. So if Cabal
migrates to attoparsec
, haddock
can do it too.
Neither scientific
nor integer-logarithms
nor hashable
are crucial for attoparsec
, so I'd suggest to split out attoparsec-core
and depend on it only.
So,
aeson
may be out (#1558), but what if we usedText
andMap
instead ofString
and[]
?It ain't looking too good.
In fact, it looks downright weird!
Basline
Same baseline in #1558
Just with encoding improvements
The
encode-to-builder
stuff isn't great - it's inefficiently allocating a bunch of extra lists. ThefoldMap charToBuilder
thing is also wasteful. Fix that up, using some of the code fromaeson
, and we got:Saving ~3MB total memory use, 248MB total allocations. Both JSON index generation and total runtime is slower: index generation is 1711.11ms vs 1568ms. JSON allocations are down from 7682 MB to 7466 MB, but total memory use barely moved.
Weird.
With
Text
instead ofString
Text
is more efficient thanString
, right? This should be a lot better.Nice - saving ~80ms on the JSON index. But allocationsare up to 7908 MB - 500MB more than
String
, and ~220MB more than the encoding.The overall memory use is somehow down to 847 MB from 965 MB, despite allocating more. Curious.
List to Map
This replaces the
[(Text, Value)]
withMap Text Value
.... Slightly slower still, with worse allocations.
Further Work
aeson
improved JSON index generation time by 76% and allocations by 90%. These patches make things worse, though also somehow reduce overall memory use by a decent amount. What gives?The index generation code basically just does
encodeToBuilder . toJSON
, which makes a big[Object]
, where eachObject
is a map of fourString
s. I suspect that copying more of theValue -> Builder
code into Haddock should help.