Open sergv opened 9 months ago
@treeowl any chance to look at this please?
Yeah, I'll take a look. Sorry for the delay.
I've updated the docs.
If this is added, it should also be added for IntMaps.
I agree that it would be nice to add partitionKeys
for IntMap
too. However thanks to different map structure the implementation's probably going to be more convoluted that for regular map. Given that I didn't find a usecase for this function on IntMap
s I didn't implement it. I'd prefer to get partitionKeys
for regular maps first - maybe someone else will be motivated to add one for int maps, who knows.
If it's a hard requirement to have the same functions in Data.Map
and Data.IntMap
I guess I can add a simple partitionKeys f xs = (withoutKeys f xs, restrictKeys f xs)
. More efficient version could come later, from me in case I'll need this function on IntMap
s myself. Please let me know your perferences whether to do it.
That being said, I feel like this function is too specialized (to be fair, I feel the same about restrictKeys and withoutKeys). There are a lot of operations that could be fused together to be more efficient, but I don't think that alone warrants adding special functions for them. Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k
member
s) m, which is equally clear IMO, albeit maybe a bit slower.
I see the attached benchmark as a clue that it's worthwhile to extend API with partitionKeys
. It does seem that fusion could be of benefit so library seems like a natural place to have it in.
One of the synthetic benchmarks shows 40% speedup - looks like pretty good speedup. Another (arguably equivalent) benchmarks shows 20% speedup - these numbers motivated me to do the PR since I want to get those speedups in my programs.
API growth is unfortunate, but what's the cost of using slower version? It affects all the users and the runtime cost is paid every time their programs run.
Regarding many other operations that can be fused together I don't think it's realistic to foresee them all and add them beforehand. There're many of those and it's not clear whether anyone actually needs it. I'd advocate for reactive approach like this PR - when someone finds a usecase for fusing some operations and is motivated enough to implement it then it could be considered for inclusion.
That being said, I feel like this function is too specialized (to be fair, I feel the same about
restrictKeys
andwithoutKeys
).
It depends on your application, I use restrictKeys
and withoutKeys
a lot, they are crisp and idiomatic. In my line of work partitionKeys
would be helpful as well.
There is a general API pattern take
/ drop
/ spanAt
, takeWhile
/ dropWhile
/ span
, filter
/ partition
, and I think partitionKeys
fits it nicely.
Another alternative is partitionKeys m s = partitionWithKey (\k _ -> k
member
s) m, which is equally clear IMO, albeit maybe a bit slower.
I did benchmark partitionWithKey
. My benchmark is at https://github.com/sergv/containers/commit/416888cdc4e9fab98adbb31852982a20a229aee9. The results are:
$ TERM=dumb cabal run map-benchmarks --builddir /tmp/dist -- -p '/All.even/ || /All.odd/'
Created semaphore called cabal_semaphore_6 with 32 slots.
All
even
restrictKeys+withoutKeys: OK
102 μs ± 6.0 μs
partitionKeys: OK
83.3 μs ± 5.7 μs, 0.82x
partitionWithKey: OK
284 μs ± 25 μs, 2.80x
odd
restrictKeys+withoutKeys: OK
224 μs ± 21 μs
partitionKeys: OK
140 μs ± 11 μs, 0.63x
partitionWithKey: OK
298 μs ± 25 μs, 1.33x
All 6 tests passed (1.11s)
So far it doesn't look like partitionKeys
is competitive with restrictKeys
+ withoutKeys
pair - it is slower. In my particular use case I want faster alternative if it can be achieved with reasonable effort. I could have made a mistake in the benchmark though - please correct me if that's the case.
@treeowl just a gentle reminder to review.
So far it doesn't look like
partitionKeys
is competitive withrestrictKeys
+withoutKeys
pair - it is slower.
I suppose you meant partitionWithKeys
instead of partitionKeys
?..
@treeowl I know that you are exceedingly busy, so I feel bad for being annoying, but I could benefit from a faster partitionKeys
indeed. Could we possibly ask someone else to review (@meooow25?) to speed things up?
So far it doesn't look like partitionKeys is competitive with restrictKeys + withoutKeys pair - it is slower.
Yes, I meant partitionWithKeys
(defined as \ks -> M.partitionWithKey (\k _ -> S.member k ks) m
).
As mentioned in https://github.com/haskell/containers/issues/158, sometimes we'd like to get results from both
restrictKeys
andwithoutKeys
for the same map and set. It can be done more efficiently by fusing traversals.I named new function
partitionKeys
instead ofpartitionSet
because the originals it's fusing end in*Keys
so I believe this is more consistent.Benchmarks show that new version is around 20-40% faster, depending on inputs. Here's a run with locally modified containers benchmarking suite that measures with even and odd keys floating around (for some reason odd keys show more speedup hence that's what I committed):
In the process of checking generated core I noticed that
splitMember
gets called with explicitOrd
dictionary, so I changed it a bit so that it would specialize. I've only checked core on9.6.2
though.