haskellari / postgresql-libpq

Low-level Haskell bindings for libpq
BSD 3-Clause "New" or "Revised" License
20 stars 27 forks source link

Cabal flag for linking with jemalloc ? #45

Closed ulidtko closed 1 year ago

ulidtko commented 1 year ago

Hi again @phadej :wave:

Asking for opinion before a PR.

In followup to haskellari/postgresql-simple#114 — I've found a heavy heap fragmentation issue in my LO-intense workload — and determined that it's not the Haskell heap which got fragmented, but rather the C malloc one.

https://github.com/haskellari/postgresql-libpq/blob/900abcb236787e814c91af3ee029c379e759cee4/src/Database/PostgreSQL/LibPQ.hs#L2012-L2025

:point_up: Here, the reallocBytes call, while releasing the extra unused space under maxLen — will create differing-length allocations, which depend on LO sizes. So if I do tens of thousands of varying-size LO reads — this puts stress onto malloc to handle fragmentation well.

image

The fixed graph on the left — is exact same test running against exact same Haskell executable, but with LD_PRELOAD=jemalloc.so. The baseline on the right — with stock allocator in libc6 2.35-0ubuntu3.1 on Ubuntu 22.04.2 LTS.

Would you approve adding a flag jemalloc to the cabal-file here?

https://jemalloc.net/ to perhaps save you a search query :sweat_smile:

phadej commented 1 year ago

Would you approve adding a flag jemalloc to the cabal-file here?

No. mallocBytes comes from base, and I don't have to worry about how it works & whether different allocators would conflict.

ulidtko commented 1 year ago

No. mallocBytes comes from base, and I don't have to worry about how it works & whether different allocators would conflict.

Okay, got it :+1: Fair enough, good to know, thanks for reply :pray:

I'm closing the issue then.

Had just one question — whether the LO buffers could be allocated on Haskell heap instead of C heap? But will self-answer.

There's this mallocForeignPtrBytes API — it's fairly old, base-4.0.0.0 has it — conceptually equivalent to mallocBytes _ >>= newForeignPtr finalizerFree, but backed by newPinnedByteArray# instead.

Superficially it seems usable here — but it's not straightforward, due to unknown-beforehand allocation length which the reallocBytes call handles. So, without a realloc variation for bytearray-ForeignPtr's (which I don't see¹), we'll either "slop-leak" unused trailing memory, or will need buffer copying.


¹ shrinkMutableByteArray# exists, but isn't threaded through to ForeignPtr API.