Add partition-after and partition-before

tomdl89 commented 1 year ago

Every few months I see a question on the clojurians slack about how to partition a collection by splitting after an element matches a condition. All solutions I see are either less performant than the solution in this PR or much less readable. This is also something I have written in some form many times before, and I've seen in a couple of code bases I've worked in.

It seems to complement the take-upto and drop-upto functions nicely too. I mostly followed the source code for these, and partition-by in clojure.core, but used volatile vectors to store the partitions in the transducer form rather than java.util.ArrayList so as to make it more cross-platform friendly. All tests pass.

tomdl89 commented 1 year ago

I gave it the name partition-upto for consistency with take-upto and drop-upto but I think partition-after or somesuch may make more sense.

weavejester commented 1 year ago

I think this seems reasonable. I seem to recall I've run into a use-case for this, myself. I do think that partition-after might be more accurate.

I'd also like to get some specific use cases mentioned in this PR as well. Some examples of when this would be useful would help justify it.

tomdl89 commented 1 year ago

Cool, I'll take a look at the comment(s) and listing some use cases. I did have a thought that partition-after might (rightfully) imply the need for a partition-before which would use take-while to generate run in the lazy form. Not sure what you think about having both?

weavejester commented 1 year ago

Having both is reasonable as long as there's no easy existing way to achieve either. I don't think there is, but I'd like to double check to ensure we're not reimplementing something that can be done conventionally.

tomdl89 commented 1 year ago

I've changed the name and fixed the redundant let binding. lmk if you think it needs anything else to be merged 🙏.

tomdl89 commented 1 year ago

Sorry for the noise. I've added partition-before because I don't think it can be easily done conventionally. Ready to review.

tomdl89 commented 1 year ago

In response to

I'd also like to get some specific use cases mentioned in this PR as well.

I grepped for questions on the clojurians slack. There's plenty:

2022-11-17

For example, let’s say I have the sequence: '("BEGIN" "a" "b" "c" "d" "END" "BEGIN" "foo" "bar" "END" "BEGIN" "c" "l" "j" "cool" "END") And I would, ideally, group this into a sequence of three sequences: '(("BEGIN" "a" "b" "c" "d" "END") ("BEGIN" "foo" "bar" "END") ("BEGIN" "c" "l" "j" "cool" "END"))

Would be solved by either (partition-after #{"END"} coll) or (partition-before #{"BEGIN"} coll)

2022-03-11

Dumb question but what would be the Clojure way for transforming a list of words into a list of sentences? E.g. [ “Hello.” “How” “are” “you?” “I” “am” “fine.” ] => [ “Hello.” “How are you?” “I am fine.“] I’m thinking it’ll involve a reduce.

Would be solved by (map #(clojure.string/join " " %) (m/partition-after #(re-find #"[.?!]$" %) coll))

2022-07-09

I’ve tried partition-by but groups items together. I’ve a collection of has-map ordered and I’ve to split them when a value of an item is null... I was looking more for [{:k 1} {:k 2} {:k nil} {:k 3} {:k 4}] =>(({:k 1} {:k 2}) ({:k nil} {:k 3} {:k 4})) something like (split-when f coll)

Would be solved by (m/partition-before #(= [:k nil] (find % :k)) coll)

2022-06-25
Is there a way to do something like partition-by but where it creates a new group whenever the value is true, rather than when it changes? I've written a reduce version, but I assume I'm missing a trick with partition-by or another function
```
(->> [1 3 2 4 1 2 3]
(reduce
(fn [acc n]
(if (odd? n)
(conj acc [n])
(update
acc
(dec (count acc))
#((fnil conj []) % n))))
[]))
;; => [[1] [3 2 4] [1 2] [3]]
```

Would be solved by (m/partition-before odd? coll)

2021-11-03

Trying to figure out a little data transformation puzzle: • I have a list of strings with a start 0 and end 1 marker • Sometimes I get them split up by the API I’m working with, e.g. ["0a1" "0b1" "0c" "c1" "0d1"] • And I want to transform this list so that I get ["0a1" "0b1" "0cc1" "0d1"]

Would be solved by (map clojure.string/join (m/partition-before #(re-find #"^0" %) coll))

I hope these examples are sufficient. There's plenty more but I think they make a decent enough case. FWIW, most answers either suggested using spec, or did an explicitly stateful partition-by (by using an atom to track need to split).

weavejester commented 1 year ago

Thank you for researching up those real-world examples. I think they more than justify the addition of these two functions.

weavejester commented 1 year ago

Thanks for the updates! Does inlining the functions lead to a measurable performance difference when benchmarked with something like Criterium?

tomdl89 commented 1 year ago

At the time, I quickly tested with time and noticed a (probably statistically insignificant) benefit, but having just tested with criterium, the non-inlined version is consistently (but still, probably insignificantly) faster. It's a bit confusing, but anyway, I've removed the inlining commit.

weavejester commented 1 year ago

The JVM's JIT compiler can make performance difficult to predict. What may be happening in this case is that Clojure's inlining may be interfering with the JVM's attempts to optimize, so once the JVM is warmed up its ever so slightly slower. This is why I'm often cautious about performance optimizations done in absence of benchmarking - I've been too often bitten by this in the past.

weavejester commented 1 year ago

Could you squash your commits down into one? Then I can merge.

tomdl89 commented 1 year ago

That's all squashed now. Thanks for all of the really helpful feedback.

tomdl89 commented 1 year ago

Anything outstanding I can help with?

weavejester commented 1 year ago

Thank you for your work on this, and apologies for the delay. This all looks good, so I'll merge it in and look into cutting a new release.

tomdl89 commented 1 year ago

Any plan to announce on clojurians slack #announce ? I can if not.

weavejester commented 1 year ago

You can if you want, I wasn't going to as it didn't seem like a major enough change, but maybe I'm being too cautious.

borkdude commented 1 year ago

weavejester / medley

Add partition-after and partition-before #67