haskell / containers

Assorted concrete container types
https://hackage.haskell.org/package/containers
315 stars 178 forks source link

improve benchmarks for Data.IntMap #657

Open jwaldmann opened 5 years ago

jwaldmann commented 5 years ago

Benchmarks should

NB: these bulk ops are the main reason for IntMap? if we only operate by-element, we could use hashmaps?

int-e commented 5 years ago

Here's a potential source for inspiration: https://gist.github.com/int-e/36578cb04d0a187252c366b0b45ddcb6#file-intmapfal-hs-L20-L45

jwaldmann commented 5 years ago

yeah, also https://github.com/jwaldmann/containers/blob/intmap-fromList/containers-tests/benchmarks/IntMap.hs

sjakobi commented 5 years ago

@jwaldmann That looks like a nice improvement! Why don't you simply make a PR?

jwaldmann commented 5 years ago

"Why don't you.." - because it's a drastic change that should be discussed first? Current benchmark:

defaultMain
        [ bench "lookup" $ whnf (lookup keys) m ...

my proposal

  defaultMain $ do
    e <- [ 10, 15 .. 25 ]
    return $ bgroup ("2^" <> show e)
      [ bulk
        [ ("contiguous/overlapping", [1..2^e], [1..2^e]) ...
sjakobi commented 5 years ago

@jwaldmann I guess a PR would be the perfect platform for that discussion! :)

gereeter commented 4 years ago

test bulk operations (union, intersection) - currently, they don't?

These are tested in the set-operations-intmap benchmark, which also uses a variety of data sets.

sjakobi commented 4 years ago

For reference: In #653, there's some performance work on fromList[WithKey] that needs better benchmarks with more realistic inputs.

sjakobi commented 4 years ago

For reference: In #653, there's some performance work on fromList[WithKey] that needs better benchmarks with more realistic inputs.

Also related: https://github.com/haskell/containers/issues/652

sjakobi commented 4 years ago

Regarding the problem of realistic inputs for fromList and friends: How about using e.g. splitmix to generate them randomly? Certainly that's not very realistic for many applications, but it adds another data point.

prettyprinter has a benchmark that is similarly based on randomly generated data: https://github.com/quchen/prettyprinter/blob/ab2c09419cca51fcc37760e71ef6861d26753e94/prettyprinter/bench/LargeOutput.hs#L173-L181