haskell-unordered-containers / unordered-containers

Efficient hashing-based container types
BSD 3-Clause "New" or "Revised" License
221 stars 97 forks source link

Make intersections much faster #406

Closed oberblastmeister closed 2 years ago

oberblastmeister commented 2 years ago

Resolves #225. I also incorporated techniques from #291 and #362. All tests pass. I need to do some benchmarking and clean up the code, probably reformat it also.

oberblastmeister commented 2 years ago

Benchmarks show that this is 2x 3x faster than the previous implementation! (also faster than union now)

sjakobi commented 2 years ago

Thanks for working on this! I'll review on monday.

oberblastmeister @.***> schrieb am Sa., 9. Apr. 2022, 17:10:

Resolves #225 https://github.com/haskell-unordered-containers/unordered-containers/issues/225. I also incorporated techniques from #291 https://github.com/haskell-unordered-containers/unordered-containers/issues/291 and #362 https://github.com/haskell-unordered-containers/unordered-containers/issues/362. All tests pass. I need to do some benchmarking and clean up the code, probably reformat it also.

You can view, comment on, or merge this pull request online at:

https://github.com/haskell-unordered-containers/unordered-containers/pull/406 Commit Summary

File Changes

(4 files https://github.com/haskell-unordered-containers/unordered-containers/pull/406/files )

Patch Links:

- https://github.com/haskell-unordered-containers/unordered-containers/pull/406.patch

https://github.com/haskell-unordered-containers/unordered-containers/pull/406.diff

— Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/pull/406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA36VCY5OD6APOMEJ6FESXTVEGMWTANCNFSM5S7FDMKA . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>

treeowl commented 2 years ago

Is stylish Haskell happening on GitHub or something? If so, I guess @sjakobi wants that. Don't ask me; I don't know tools.

The potential performance concern with an unboxed tuple result is that when the passed function doesn't inline, we ended up with an extra function call per element. I don't know if that makes a big enough difference to worry about; it'd be worth experimenting. I imagine it would save a lot of code duplication if it turns out okay. Specifically, it would avoid source duplication between lazy and strict versions, and object code duplication among the variants.

oberblastmeister commented 2 years ago

How is that different than with a function that does not return an unboxed tuple? Shouldn't it still have to make an extra function call for each element if the function doesn't inline?

treeowl commented 2 years ago

Oh, I was thinking of not inlining the version that produces an unboxed tuple. If we do inline that into the other variants, then yeah, everything should be basically the same but with less source code. But it's worth finding out how much we pay for not inlining it, because object code isn't free either.

oberblastmeister commented 2 years ago

What if we just inline the ones with the unboxed tuples. Is there any disadvantage? Because we already inline intersectionWith, so inline intersectionWith# shouldn't create extra code? Then later we can experiment and see if we don't have to inline the unboxed version.

treeowl commented 2 years ago

I opened a PR against your branch to do that.

treeowl commented 2 years ago

There are CI failures on older GHC, because shrinking wasn't available yet. We'll need a fall-back.

treeowl commented 2 years ago

The fallback should probably just define the shrinking operation manually in the Array module. It won't be too efficient, but whatever.

oberblastmeister commented 2 years ago

Should I just use something like copy to implement the fallback?

treeowl commented 2 years ago

I dunno. I haven't actually looked at how you're using shrink. Really, you can do whatever you think is reasonable, but I'd prefer to keep the CPP for it in Array if possible.

treeowl commented 2 years ago

Oh, I just looked. You'll want to use cloneSmallMutableArray#

oberblastmeister commented 2 years ago

Something like

#if __GLASGOW_HASKELL__ >= 8.10.7
shrink = ...
#else
shrink = ...
#endif

(never used cpp before)

treeowl commented 2 years ago

Yup! It's one of the world's worst macro systems, but it's a lot faster than Template Haskell. Sigh

treeowl commented 2 years ago

You probably have an off by one error in your clone; GHC has some funny ways of counting copies.

treeowl commented 2 years ago

Non-urgent, non-merge-blocking: we might want to compare the performance of cloning to shrinking when using the non-moving garbage collector. Shrinking is rather more involved in that context, and it looks like it even makes a C call. I don't know if the effect will be magnified by generational issues for large maps, but it wouldn't shock me.

oberblastmeister commented 2 years ago

Unfortunately, adding {-# NOINLINE -#} to intersectionWithKey# results in a 10 microsecond performance degredation. So it looks like we will have to inline it to eliminate the closure

treeowl commented 2 years ago

We don't want to NOINLINE it; that loses specialization. The question is whether it's okay for it to be INLINABLE and not explicitly inlined. We'd want to verify that it doesn't get inlined into the benchmark.

treeowl commented 2 years ago

It may well need inlining; just want to double check. And that percentage change is it?

oberblastmeister commented 2 years ago

Putting INLINABLE results in the same performance decrease

oberblastmeister commented 2 years ago

Sorry, INLINEABLE is slower than INLINE, but faster than NOINLINE, probably due to the typeclass specialization that you described

oberblastmeister commented 2 years ago
(in microseconds) INLINE INLINABLE NOINLINE
40 50 56
treeowl commented 2 years ago

Okay, I guess we should go with INLINE for now. I guess in practice it's not likely to be called with many different function arguments in a given program. We should make sure that GHC sees it as having arity 2—it should inline once applied to a dictionary and a function (and similarly for the user-exposed functions). Do you want to do the strict thing now, or in another PR?

oberblastmeister commented 2 years ago

It is pretty late for me now, so I could either wait until tomorrow or do it in another pr.

treeowl commented 2 years ago

Any thoughts, @sjakobi ?

sjakobi commented 2 years ago

I'm AFK. I'll Review tomorrow.

David Feuer @.***> schrieb am So., 10. Apr. 2022, 03:31:

Any thoughts, @sjakobi https://github.com/sjakobi ?

— Reply to this email directly, view it on GitHub https://github.com/haskell-unordered-containers/unordered-containers/pull/406#issuecomment-1094155148, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA36VC662IADWF3G2EXWXMLVEIVNPANCNFSM5S7FDMKA . You are receiving this because you were mentioned.Message ID: <haskell-unordered-containers/unordered-containers/pull/406/c1094155148@ github.com>

oberblastmeister commented 2 years ago

Why is unionWith, unionWithKey, and union marked as INLINE? Shouldn't they be marked as INLINABLE and then use the inline function explicitly when we want to implement something in terms of the other. For example union = inline unionWith const just like I am doing with intersection. Wouldn't marking them as INLINE create excessive code bloat?

sjakobi commented 2 years ago

Why is unionWith, unionWithKey, and union marked as INLINE?

union is actually marked INLINABLE.

For unionWith[Key], I suspect the motivation is to inline the function argument, so using it doesn't incur function calls.

We can consider using INLINABLE for these instead. Feel free to open an issue.

treeowl commented 2 years ago

When we don't inline the function, it's probably better to pass the function around through the recursion instead of pulling it out. Tradeoffs....

sjakobi commented 2 years ago

Could you please rebase, so the changes from #407 are removed from the diff?

oberblastmeister commented 2 years ago

What should I do about inlining? I understand the need to eliminate the closures, but the functions are truly massive, intersection has 1,700 terms, while unionWithKey has 2,200! Wouldn't it be bad to mark these as {-# INLINE #-}? We could add a comment saying to explicitly inline if you want to remove the closure.

oberblastmeister commented 2 years ago

Also if we pass the function around through recursion, then we wouldn't be able to implement intersection in terms of intersectionWithKey right?

sjakobi commented 2 years ago

What should I do about inlining? I understand the need to eliminate the closures, but the functions are truly massive, intersection has 1,700 terms, while unionWithKey has 2,200! Wouldn't it be bad to mark these as {-# INLINE #-}?

I think we should stick with INLINABLE until we're convinced that INLINE is better somehow. INLINABLE is better for compile times for example.

We could add a comment saying to explicitly inline if you want to remove the closure.

I think the docs of these functions are the wrong place to teach people about GHC.Exts.inline. Maybe it should be mentioned in https://github.com/input-output-hk/hs-opt-handbook.github.io?!

oberblastmeister commented 2 years ago

@sjakobi Is there anything else that I need to do?

sjakobi commented 2 years ago

Thank you, @oberblastmeister! :)