Closed oberblastmeister closed 2 years ago
Benchmarks show that this is 2x 3x faster than the previous implementation! (also faster than union now)
Thanks for working on this! I'll review on monday.
oberblastmeister @.***> schrieb am Sa., 9. Apr. 2022, 17:10:
Resolves #225 https://github.com/haskell-unordered-containers/unordered-containers/issues/225. I also incorporated techniques from #291 https://github.com/haskell-unordered-containers/unordered-containers/issues/291 and #362 https://github.com/haskell-unordered-containers/unordered-containers/issues/362. All tests pass. I need to do some benchmarking and clean up the code, probably reformat it also.
You can view, comment on, or merge this pull request online at:
https://github.com/haskell-unordered-containers/unordered-containers/pull/406 Commit Summary
- 21f238b https://github.com/haskell-unordered-containers/unordered-containers/pull/406/commits/21f238b1e11522262b455ed82ab3562855589b96 fast intersection
File Changes
(4 files https://github.com/haskell-unordered-containers/unordered-containers/pull/406/files )
- M Data/HashMap/Internal.hs https://github.com/haskell-unordered-containers/unordered-containers/pull/406/files#diff-041a696021ee461f2b36d411c1527c9edc788673ab86dd8b333e440e60bac519 (179)
- M Data/HashMap/Internal/Array.hs https://github.com/haskell-unordered-containers/unordered-containers/pull/406/files#diff-b95c39ca3aa6cff994008c0aa367e1219dd7b5d87e5d062e0a23cc357c35a241 (9)
- M Data/HashSet/Internal.hs https://github.com/haskell-unordered-containers/unordered-containers/pull/406/files#diff-9f04dc136a97e0fe68000e7d80596acf6cbbad887bae2584f010a5abfe5a3e07 (2)
- M tests/Properties/HashMapLazy.hs https://github.com/haskell-unordered-containers/unordered-containers/pull/406/files#diff-611cf8fcbd25d0feef21dc69faa11bf49d66de52ce11b6e762aab5e11ddcb882 (32)
Patch Links:
- https://github.com/haskell-unordered-containers/unordered-containers/pull/406.patch
https://github.com/haskell-unordered-containers/unordered-containers/pull/406.diff
— Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/pull/406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA36VCY5OD6APOMEJ6FESXTVEGMWTANCNFSM5S7FDMKA . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>
Is stylish Haskell happening on GitHub or something? If so, I guess @sjakobi wants that. Don't ask me; I don't know tools.
The potential performance concern with an unboxed tuple result is that when the passed function doesn't inline, we ended up with an extra function call per element. I don't know if that makes a big enough difference to worry about; it'd be worth experimenting. I imagine it would save a lot of code duplication if it turns out okay. Specifically, it would avoid source duplication between lazy and strict versions, and object code duplication among the variants.
How is that different than with a function that does not return an unboxed tuple? Shouldn't it still have to make an extra function call for each element if the function doesn't inline?
Oh, I was thinking of not inlining the version that produces an unboxed tuple. If we do inline that into the other variants, then yeah, everything should be basically the same but with less source code. But it's worth finding out how much we pay for not inlining it, because object code isn't free either.
What if we just inline the ones with the unboxed tuples. Is there any disadvantage? Because we already inline intersectionWith
, so inline intersectionWith#
shouldn't create extra code? Then later we can experiment and see if we don't have to inline the unboxed version.
I opened a PR against your branch to do that.
There are CI failures on older GHC, because shrinking wasn't available yet. We'll need a fall-back.
The fallback should probably just define the shrinking operation manually in the Array
module. It won't be too efficient, but whatever.
Should I just use something like copy
to implement the fallback?
I dunno. I haven't actually looked at how you're using shrink
. Really, you can do whatever you think is reasonable, but I'd prefer to keep the CPP for it in Array
if possible.
Oh, I just looked. You'll want to use cloneSmallMutableArray#
Something like
#if __GLASGOW_HASKELL__ >= 8.10.7
shrink = ...
#else
shrink = ...
#endif
(never used cpp before)
Yup! It's one of the world's worst macro systems, but it's a lot faster than Template Haskell. Sigh
You probably have an off by one error in your clone; GHC has some funny ways of counting copies.
Non-urgent, non-merge-blocking: we might want to compare the performance of cloning to shrinking when using the non-moving garbage collector. Shrinking is rather more involved in that context, and it looks like it even makes a C call. I don't know if the effect will be magnified by generational issues for large maps, but it wouldn't shock me.
Unfortunately, adding {-# NOINLINE -#}
to intersectionWithKey#
results in a 10 microsecond performance degredation. So it looks like we will have to inline it to eliminate the closure
We don't want to NOINLINE
it; that loses specialization. The question is whether it's okay for it to be INLINABLE
and not explicitly inline
d. We'd want to verify that it doesn't get inlined into the benchmark.
It may well need inlining; just want to double check. And that percentage change is it?
Putting INLINABLE
results in the same performance decrease
Sorry, INLINEABLE
is slower than INLINE
, but faster than NOINLINE
, probably due to the typeclass specialization that you described
(in microseconds) | INLINE |
INLINABLE |
NOINLINE |
---|---|---|---|
40 | 50 | 56 |
Okay, I guess we should go with INLINE
for now. I guess in practice it's not likely to be called with many different function arguments in a given program. We should make sure that GHC sees it as having arity 2—it should inline once applied to a dictionary and a function (and similarly for the user-exposed functions). Do you want to do the strict thing now, or in another PR?
It is pretty late for me now, so I could either wait until tomorrow or do it in another pr.
Any thoughts, @sjakobi ?
I'm AFK. I'll Review tomorrow.
David Feuer @.***> schrieb am So., 10. Apr. 2022, 03:31:
Any thoughts, @sjakobi https://github.com/sjakobi ?
— Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/pull/406#issuecomment-1094155148, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA36VC662IADWF3G2EXWXMLVEIVNPANCNFSM5S7FDMKA . You are receiving this because you were mentioned.Message ID: <haskell-unordered-containers/unordered-containers/pull/406/c1094155148@ github.com>
Why is unionWith
, unionWithKey
, and union
marked as INLINE
? Shouldn't they be marked as INLINABLE
and then use the inline
function explicitly when we want to implement something in terms of the other. For example union = inline unionWith const
just like I am doing with intersection
. Wouldn't marking them as INLINE
create excessive code bloat?
Why is
unionWith
,unionWithKey
, andunion
marked asINLINE
?
union
is actually marked INLINABLE
.
For unionWith[Key]
, I suspect the motivation is to inline the function argument, so using it doesn't incur function calls.
We can consider using INLINABLE
for these instead. Feel free to open an issue.
When we don't inline the function, it's probably better to pass the function around through the recursion instead of pulling it out. Tradeoffs....
Could you please rebase, so the changes from #407 are removed from the diff?
What should I do about inlining? I understand the need to eliminate the closures, but the functions are truly massive, intersection
has 1,700 terms, while unionWithKey
has 2,200! Wouldn't it be bad to mark these as {-# INLINE #-}
? We could add a comment saying to explicitly inline
if you want to remove the closure.
Also if we pass the function around through recursion, then we wouldn't be able to implement intersection
in terms of intersectionWithKey
right?
What should I do about inlining? I understand the need to eliminate the closures, but the functions are truly massive,
intersection
has 1,700 terms, whileunionWithKey
has 2,200! Wouldn't it be bad to mark these as{-# INLINE #-}
?
I think we should stick with INLINABLE
until we're convinced that INLINE
is better somehow. INLINABLE
is better for compile times for example.
We could add a comment saying to explicitly
inline
if you want to remove the closure.
I think the docs of these functions are the wrong place to teach people about GHC.Exts.inline
. Maybe it should be mentioned in https://github.com/input-output-hk/hs-opt-handbook.github.io?!
@sjakobi Is there anything else that I need to do?
Thank you, @oberblastmeister! :)
Resolves #225. I also incorporated techniques from #291 and #362. All tests pass. I need to do some benchmarking and clean up the code, probably reformat it also.