jduey / protocol-monads

A protocol based monad implementation for clojure
62 stars 6 forks source link

a lazier plus function? #6

Open michaelsbradleyjr opened 12 years ago

michaelsbradleyjr commented 12 years ago

I've been studying the library and its test suite, trying to come to terms with monads.

I became interested in the test which is commented out on line 269 in test/core.clj.

Is the problem with maybe-monad as such? Or is it a "laziness issue" with respect to the plus function defined on line 13 in monads/core.clj?

Here are the relevant bits of a hack which I cooked up today:

https://gist.github.com/3989325

It's ugly, and I realize that bringing eval into the picture isn't a proper solution. Also, with that hack in place, other parts of the library don't work correctly and many of its tests fail.

This is a learning exercise for me, and you may already be aware of the underlying problem, but I wanted to share my "hack" and get your thoughts on how a "lazier" plus / plus-step might be properly implemented.

michaelsbradleyjr commented 12 years ago

I've implemented a variant of plus, which I've named plus*. See line 62 in monads/core.clj on my examples branch.

It involves another method for the Monad protocol, plus-step*, which so far I've implemented for maybe-monad and state-transformer.

The end result is that the two tests you had commented out in test/core.clj now pass with flying colors, i.e. if they're changed to use plus* instead of plus ...... Actually, that's not quite the end of the story. For some reason, after slightly redefining "regular" plus-step for the state transformer, the second test you had commented out will pass (i.e. it won't throw an exception) whether you use plus or plus*. I'm not sure why that's the case, but it's interesting to think about. The first test you had commented out definitely does require usage of plus* in order to avoid throwing an exception.

The idea is that if the developer needs a "lazier" plus, the library provides it, but usage is optional and plus* doesn't get in the way of plus.

By the way, I've got a fair number of changes / experiments in progress on my examples branch, e.g. the optional "return type checking" we discussed briefly in IRC. For that reason, most of the tests in test/core.clj on that branch are commented out as I chip away at making it all work nicely. The feature seems promising, though, as it's a nice way for developers to make sure their monadic functions are actually returning the same types as the bound monadic value and/or the protocol-monad specified via the do macro. If the dynamic vars specifying warn or throw are not set to true, then wrap-check simply dumps out the un-wrapped monadic function, so there should be little performance overhead when you're not checking the return types. In some sense, check-return-types is overly strict.

Consider the following:

(binding [m/*throw-on-mismatch* true]
 (m/do list
       [x (range 5)
        y (range 3)]
       (+ x y)))

With m/*throw-on-mismatch* bound to true, that expression will throw an exception, because (range ...) has the effect of setting up an m/bind call on an instance of clojure.lang.LazySeq (thus involving the lazy-seq protocol-monad), while the monadic function defined over (+ x y) will (owing to the list protocol-monad being specified for m/do) return values of type clojure.lang.PersistentList.

We can "fix" the example like so:

(binding [m/*throw-on-mismatch* true]
 (m/do list
       [x (into '() (range 5))
        y (into '() (range 3))]
       (+ x y)))

Now all the types are aligned, and no exception will be thrown.

As suggested, this is "overly strict" since LazySeq and PersistentList are hardly incongruent for this purpose. BUT, and I think this is the key, being alerted to the fact that there is a mismatch strictly speaking can help a developer to think carefully about the monadic computations he or she is putting together. And such warnings only become more valuable as the computations become more complex. When the developer is satisfied that it all works correctly, the return type checker can be relaxed and mismatches can be deliberately introduced where they won't cause any problems.

Well, this is getting somewhat far afield from the issue at hand ("lazier plus"), but I wanted to give you a heads-up.

I have no idea whether you'll actually end up wanting to merge in these changes and features, but this has been a huge learning experience for me and I want to thank you for all the hard work you've put into this library. It's really well written, and it's been my pleasure to take it apart and figure out how it works (and how I can put it to good use!)

jduey commented 12 years ago

Great work. Some nice ideas that I'm currently going through.

I changed the test here to be more pathological so that slight change here doesn't work. I think you'll be able to see why. :)

jduey commented 12 years ago

In your plus* you might eliminate the need for a let with some destructuring of the arg like this

defmacro plus* [[mv & mvs]] ... )

michaelsbradleyjr commented 12 years ago

Destructuring the argument for plus* is a handy improvement and I've implemented it, thanks for the suggestion. I originally avoided doing so while working out some of the quirkiness of transforming the members of (rest mvs) into a list of thunks.

I fixed my implementation of "regular" plus-step so that it works correctly, but I still found that "regular" plus / plus-step was behaving lazily under certain conditions involving the maybe and state monads. I finally tracked this down to an issue that involves some subtle complexities of laziness, chunked sequences and drop-while (in the plus-step implementation for maybe-monad).

Long story short, in the plus-step implementation for state-transformer I now invoke doall in order to eliminate edge cases of "unexpected laziness" (if that term makes sense). A similar approach may be applicable to the other transformers, but I haven't gotten that far yet.

My thought is that, per convention and documentation (to come at some point), plus can be indicated as non-lazy and plus* as fully lazy; and owing to the implementation detail mentioned above, users of the library won't get any surprises where non-lazy plus sometimes behaves lazily.