haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
314 stars 177 forks source link

Alternative optimisations for insertion #980

Open HuwCampbell opened 6 months ago

HuwCampbell commented 6 months ago

This is to open the conversation. I haven't looked at the strict version or gone 100% at this stage.

I'm not convinced that the (unsafe) optimisations for identical pointers is worth the complexity, and it hurts performance for inserts of new elements and the replacement of elements compared to this version.

Here, instead of checking pointer equality, we instead pass upwards whether a rebalance is required.

Old
  insert absent:                  OK (5.11s)
    309  μs ± 9.5 μs
  insert present:                 OK (2.05s)
    249  μs ±  21 μs
  insert alternate:               OK (2.09s)
    253  μs ±  21 μs
  insertWith absent:              OK (2.61s)
    313  μs ±  16 μs
  insertWith present:             OK (2.16s)
    260  μs ±  22 μs

New
  insert absent:                  OK (2.90s)
    351  μs ±  14 μs
  insert present:                 OK (3.20s)
    194  μs ±  16 μs
  insert alternate:               OK (3.18s)
    192  μs ±  11 μs
  insertWith absent:              OK (1.43s)
    345  μs ±  31 μs
  insertWith present:             OK (1.66s)
    200  μs ±  14 μs

So inserting when the key is missing (common) or a new item at the same key (also common), are faster, while only literally inserting the same identical item (i.e., a no-op, probably less common) is slower.

I've redone the benchmarks, adding some extra calls to rnf in the suite. It looks like insert is about 22% faster if the key is matched and 14% slower if it's not.

I've also done the same trick for insertWith, where it's closer to 25% faster and 10% slower respectively.

For insertR, I've instead manually built the continuation, and just return the top level map if the key is found. I'm not sure if this helps much at this stage; union uses it, but it doesn't lean on it too hard.

treeowl commented 6 months ago

I'm surprised you're getting good results with explicit continuations. Do you know how GHC is transforming those?

HuwCampbell commented 6 months ago

Yeah I couldn't really see any differences in the union benchmarks (maybe one was marginally faster?). From trying an explicit continuation on the other inserts, I expect that it is slower when there is a value inserted.

I haven't looked at the core yet.

I also considered just throwing a sentinel exception if EQ is found, then catching and returning the top map. It would probably be fast, but exceptions for control flow are a bit gross.

(Actually, delimited continuations are in GHC now, could use those).

treeowl commented 6 months ago

I would expect exceptions to be slow. I think I might have tried? I don't remember for sure. If you try, use unsafeDupablePerformIO to catch the exception. Also consider using raise# and catch# manually, to allow an exception type other than SomeException (this may require reallyUnsafePtrEquality# to determine whether you got the not-present exception).

On Thu, Jan 11, 2024, 10:31 PM Huw Campbell @.***> wrote:

Yeah I couldn't really see any differences in the union benchmarks (maybe one was marginally faster?). From trying an explicit continuation on the other inserts, I expect that it is slower when there is a value inserted.

I haven't looked at the core yet.

I also considered just throwing a sentinel exception if EQ is found, then catching and returning the top map. It would probably be fast, but exceptions for control flow are a bit gross.

— Reply to this email directly, view it on GitHub https://github.com/haskell/containers/pull/980#issuecomment-1888378630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOOF7I45DIO6Q6DACGP4PLYOCVARAVCNFSM6AAAAABBWC3ULWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBYGM3TQNRTGA . You are receiving this because you commented.Message ID: @.***>

HuwCampbell commented 6 months ago

I'm not as concerned with insertR as insert and insertWith.

I guess the question is do we want a 10-15% sacrifice on insertion when the key in not already present but a 20-30% improvement when it is?

How would that decision be made?