Idiomatic Clojure - code smells

Commentary on Slack https://clojurians.slack.com/archives/C8NUSGWG6/p1591453495326800

;;; Don't

(into {} (map (fn [[k v]] [k (f v)]) m));;; Do
(reduce-kv (fn [m k v] (assoc m k (f v))) {} m)

Why prefer the more verbose version? Or even

(into {} (map #(update % 1 f)) m)

Although, I'm not sure how bad "update the val element of a map entry" is. The docs only say that "a map entry is treated as an ordered collection of key and value", not that it's associative. (edited) Generally, for performance. Its verbosity is why I recommended hiding it behind a function and not writing it out manually every time. I played around with several variations and the more verbose option was about 2x faster.

Well, into will use transient collections for you. (edited) Ben Sless 22 hours ago Still. I ran tests on inputs of various sizes

Take these two for example:

;;; Do
(reduce-kv (fn [m k v] (assoc m k (f v))) {} m)
;;; Or faster* but less pretty
(persistent!
 (reduce-kv (fn [m k v] (assoc! m k (f v))) (transient {}) m))

And I've seen plenty of advice to not worry about synthetic benchmarks on JVM, especially for small collections. :) Which is exactly where you say transient collections might be slower.

The first was faster for small inputs, the latter faster by ~30% for large inputs

How would you define synthetic in this case?

Pretty sure nnirst is not a function. :slightly_smiling_face: oops, thanks, will fix

How would you define synthetic in this case?

It's a question for a more general #clojure community - I was not one of the authors of those advices. My understanding is when it's something that's run in isolation. A benchmark for the sole purpose of benchmarking. (edited)

hm. interesting. Let's take it there

alexmiller if you want to traverse a map, reduce-kv is often the fastest way. It’s self-reducing so ends up as a single loop in the bytecode. And it avoids constructing (and then tearing apart via destructuring) map entries.

map entries are built in traversal time?

alexmiller Yes, reduce or seq will build entry objects

alexmiller 22 hours ago The actual ks and vs are in object arrays in tree nodes

In general, what are your thoughts on the benchmarking question?

alexmiller 22 hours ago I reworked a lot of the map reduction stuff in 1.7 and spent some time optimizing the reduce-kv path (edited) You should generally be wary of only looking at micro benchmarks due to gc and memory effects Which doesn’t mean don’t do them, but do a variety of things.

There's also the question of which code paths the JVM goes through first (as was the case with clj-tuple)

such as? holding lots of garbage in the heap?

alexmiller 22 hours ago There’s a big price to pay moving from monomorphic to binorphic to polymorphic calls and it’s easy to delude yourself by only looking at a single monomorphic case in a benchmark So benchmarking a set of cases can sometimes show you different things. You also have to be wary of the effect of safepoints etc

When you say that reduce-kv is often faster, do you mean it's the case even if used with persistent collections?

That's why I try to run my benchmarks on inputs at logarithmically varied sizes, especially given the switch from PersistentArrayMap to PersistentHashMap Are you also seeing major differences when running in the REPL vs. invoking java -jar and with direct linking?

alexmiller yes, aren’t you typically reducing over persistent colls? Transients are for building not usually a factor

Indeed. But after reading multiple discussions on the topic, it seems to me that transients are supposed to be faster in general, and that's exactly why into uses them.

alexmiller Direct linking may be, but since most people don’t use it, that’s not how I would normally test Into uses transients for constructing the output collection In into, your input is either a persistent coll or seq or produced by a transducer

Yeah this is a part which baffled me a little, although my intuition has learned to expect it by now: why had the non transient version been faster for small collections?

alexmiller There are some tricks played when constructing small persistent collections for perf Making small colls is very common so we’ve spent a lot of time optimizing that There’s some fixed overhead to using transients so it really only pays off when you have enough adds to see the amortization wins

At which size does the tipping point happen?

alexmiller I’m not sure there is a fixed point - prob depends on jvm context

As a wise man named Rich once told me, “why guess when you can measure?”

alexmiller Microbenchmarking with Criterium etc is fine, just also have a healthy skepticism while you do it :). I find just using time loops can actually be just as good, but whenever I can I try to get some real code that covers the case and try to see if I can replicate effects outside the micro case The morphic call site issue is very real and takes some practice to be mindful of - basically the jit is overfitting it’s optimizations and when you move to real world none of that stuff applies anymore Also be mindful of the absolute values - if criterium is reporting times in the single ns range, thats way below the actual sensitivity of Java timing mechanisms so a) I’m skeptical of accuracy and b) are you actually measuring a difference that has any real effect on a program? Something in the single ns range has to happen a billion times before I care :) You’ll get way more payoff usually by thinking about how you structure and process your data in the first place

criterium reporting times in the single ns range

Got bit by this once. Didn't know the function returned a lazy seq.

Also is fnext not isomorphic to second? They are the same in all ways but name and identity

practicalli / clojure

Idiomatic Clojure - code smells #153