Closed ysmiraak closed 7 years ago
Thank you for catching this bug. I found the source, and will fix it in the next release, that is due in a month approximately.
Thanks for letting me know. Looking forward to the update :)
@ysmiraak 0.5.2 is in the clojars. Please let me know whether it works well for you now.
Yes it works well now. Thanks for the early release.
I was reading the changes you made, trying to understand what happened. I noticed that your fold
for one collection calls r/fold
. I'm quite excited because this means I could replace reducers
by fluokitten
. However, when thinking about the differences between reduce
, fold
, and r/fold
, I found this inconsistency:
This example in one of the guides
(fmap + {:a 1 :b 2} {:a 3 :b 4 :c 5} {:a 6 :b 7 :c 8}) => {:a 10, :b 13, :c 13}
made me think that I could replace merge-with
by fmap
.
(def m+ [{:a 1 :b 2} {:a 3 :b 4 :c 5} {:a 6 :b 7 :c 8}])
(apply (partial merge-with +) m+) => {:a 10, :b 13, :c 13}
(reduce (partial merge-with +) m+) => {:a 10, :b 13, :c 13}
(reduce (partial merge-with +) {} m+) => {:a 10, :b 13, :c 13}
(apply (partial fmap +) m+) => {:a 10, :b 13, :c 13}
(reduce (partial fmap +) m+) => {:a 10, :b 13, :c 13}
(reduce (partial fmap +) {} m+) => {:a 10, :b 13, :c 13}
And that replacement works for reduce
, but not for fold
.
(fold (partial merge-with +) m+) => {:a 10, :b 13, :c 13}
(fold (partial merge-with +) {} m+) => {:a 10, :b 13, :c 13}
(fold (partial fmap +) m+) => #function[uncomplicate.fluokitten.algo/function-fmap/fn--25003]
(fold (partial fmap +) {} m+) => IllegalArgumentException
The problem is that the fold
in 0.5.2
calls (r/fold reducef coll)
, which uses (reducef)
for obtaining the identity element. While (merge-with +)
produces nil
which is then treated the same as {}
by merge-with
, (fmap +)
behaves the same as currying.
But I don't think fmap
should be changed.
Instead of this definition in the algo
namespace,
(defn collection-fold
...
([c f init]
(f init (r/fold f c)))
...)
if we simply provide the reducef
as required
(letfn [(collection-fold [c f init]
(r/fold (fn ([] init)
([x y] (f x y)))
c))]
(collection-fold m+ (partial fmap +) {}))
=> {:a 10, :b 13, :c 13}
then (fold (partial fmap +) {} m+)
won't produce IllegalArgumentException
anymore.
However, (fold (partial fmap +) m+)
with no init
value still produces a closure instead of the "expected" result, because (fmap +)
is used here as init
, which is not the identity element for m+
.
But I think this is the right behavior, because anyone who calls (fold f x)
must make sure that f
is a monoid operator with its own identity element, otherwise they should just call (fold f init x)
.
PS: The same problem exists for foldmap
.
PPS: I was worried that since the reducef
is likely to be called many times, wrapping it to provide the init
value can be inefficient. But I benched it with criterium
, and found no difference at all. It's probably JIT optimized away, so it probably doesn't matter for large collections. And for small collections, r/fold
is less efficient than reduce
anyways.
On second thought, maybe (fold f init x)
should not call r/fold
at all?
Because the init
value is not necessarily the identity element of some monoid, and r/fold
is only designed to work with monoids.
When f
is not a monoid operator, maybe the behavior of fold
should fallback to reduce
?
Thank you for investigating this, these are all valid and valuable insights. I optimized this code thoroughly long time ago, so I'd need to think about the details of more exotic uses (such as this one). If you have time, it would be great if you could provide test cases that demonstrate this desired and undesired behavior, so when I get to this, it is clear to me. Also, thank you very much for benchmarking this - highly appreciated!
The tricky point is that all these things also need to work with functions as monads, and currying functions as monads (jvm namespace). Maybe some more tests and examples are needed. Any definitively some more investigation of these edge cases.
The previous solution was a fix for the more immediate issue, which was a bug that was caused by wrong usage of r/fold.
When f is not a monoid operator, maybe the behavior of fold should fallback to reduce?
The problem is: how to know whether f
is a monoid operator for the monoid that is inside the collection? On top of that, I suppose that every function could be a monoid operator for some monoid that we can imagine. The only function that will for sure be a monoid operator for the given monoid is op
(provided that there are no bugs in its implementation :) but even op
needs to know which monoid is the context that we are working with. Additonal hurdle is that the requirement that (f) should give the id element is the implementation convention from r/fold, which I am not sure is required generally.
The tricky point is that all these things also need to work with functions as monads, and currying functions as monads (jvm namespace). Maybe some more tests and examples are needed.
I'm just starting to learn category theory, so it'll take some time for me to understand how these things should work with monads. In fact, I'm trying to learn CT with fluokitten. I really like the API design, and how it makes an effort to blend in with clojure. I'll try to collect some test cases as I learn, but for now, this is what I think of the problem:
I suppose that every function could be a monoid operator for some monoid that we can imagine.
I think tree-like folds cannot be said to work on any monoid: (reduce conj [] (range 9))
.
conj
wouldn't be a monoid operator because it's not associative:
(conj (conj [] 0) 1))
is not the same as (conj [] (conj 0 1))
.
The problem is: how to know whether
f
is a monoid operator for the monoid that is inside the collection?
I guess we simply can't. Clojure has three kinds of folds:
(r/fold f coll)
;(reduce f coll)
;(reduce f init coll)
.Each one is more general than the previous one.
And your fold
is even more general, allowing f
to be any Lisp function instead of a binary operator. I really love this, but I guess it's uncharted territory to integrate the Lisp formalism with CT, so essentially it's about how you wish your fold
to behave:
a. As the canonical monoid-fold. f
has to be a monoid operator, and the arity overloading is simply sugar.
(fold f [x])
is the canonical usage, for monoid <x,f>
(I'm using [x]
to mean a coll of type x
);(fold [x])
works for the default monoid for type x
;(fold f init [x])
works for custom monoids that are not defined for the 0-arity usage (convenient when passing an anonymous fn);(fold f init [x] [y] [z])
is equivalent to (fold f init (interleave [x] [y] [z]))
.b. As a swiss-army-knife fold: combining different kinds of folds through arity overloading.
(fold [x])
is the default monoid-fold;(fold f [x])
is the monoid-fold, same as r/fold
;(fold f init [x])
is the left/tree-fold, same as reduce
;(fold f init [x] [y] ..)
is something new.Clearly plan a is easier to get right, because it's simply the sugared version of Haskell's fold
. Plan b would be experimental but exciting; It seems redundant, considering we already have reduce
and r/fold
, but then if it can be well-defined, it could replace both, and that would be a great success in integrating Lisp with CT. The challenge, however, is, as you said, how to make it work properly with the other parts of your library.
Additonal hurdle is that the requirement that (f) should give the id element is the implementation convention from r/fold, which I am not sure is required generally.
I agree. But I also agree with the implementation of r/fold
, which requires that the identity element is bundled with either reducef
or combinef
, because that (sort of) guarantees the correctness when the computation is parallelized. If r/fold
allows an init
value to be specified, people might not realize that this init
has to be the identity element.
Which is why I said that maybe (fold f init x)
should not call r/fold
, unless you prefer plan a, in which case your fold
requires f
to be a monoid operator and init
to be the identity element.
conj wouldn't be a monoid operator because it's not associative:
If I imagine a monoid that consists of empty vectors, conj is very much associative. That is the catch, it all has to (also) work for trivial edge cases.
init
does not have to be identity element. If it were, it would be unnecessary.
The intended use case is the same as in reduce
: (fold + 4 [1 2 3]) => 10
. The only requirement for init is to be the same monoid as the context of the collection (which is also a monoid :)
Maybe the simplest solution would be to add the [x f] arity to the fold
protocol, and let the monoid in question (in this case, collection) decide how to handle that. Then, we wouldn't have this problem of how to find out what the id element is: every implementation will be able to discover it using the internal knowledge. What do you think about that?
Maybe the simplest solution would be to add the [x f] arity to the fold protocol, and let the monoid in question (in this case, collection) decide how to handle that.
Wouldn't this make (fold * [1 2 3])
return 0
? Because (-> [1 2 3] first id)
is 0
.
Hm, yes, I forgot that :) It seems that I made those decisions with reason back when I wrote that code :)
I think I found an elegant and general solution to this. I'll implement this when I get back.
Changed in the last 0.6.0-SNAPSHOT
With version 0.5.1 on clojure version 1.8.0, I encountered this behavior:
It seems that
fold
breaks for vectors with more than 512 elements?