recommend "flatmap" - Githubissues

rebcabin commented 12 years ago

to correspond to Haskell's "bind" and LINQ's "SelectMany." This is a fundamental operator that, in one call, applies a transformation to each element of its input collection -- a transformation that produces a collection of a second kind; and then flattens the result exactly one level. Without it, there is no primitive way to expand a collection ("map" lets you transform a collection, "filter" lets you reduce a collection, and "fold" lets you burn down a collection to a result; but "flatmap" is the fundamental one that lets you build up a collection from elementary inputs. It's fundamental in the monadic sense: it makes your collections into a proper monad with all the pursuant composability benefits.

Canonical example, from a list of customers, each of which has 0 or more orders, each of which has zero or more line-items, extract the list of all orders, preserving the underlying line-item structure but giving up the customer info. Compositionally (in another 'dotted' operation), product the list of all line-items

In a programming pattern of alternating arrays and objects, "map" and "flatten" will (somewhat accidentally) get the job done because "flatten" doesn't flatten objects, only arrays. However, if the data were modeled with pure arrays, I see no good way to do this (without the secret, second argument of "flatten") because the first flatten would blow out all substructure.

I implemented a proposal for flatmap in a fork of underscore's source, along with a few unit tests for your consideration. I don't have docco running and I didn't produce a "min" version of the javascript, so it's just the javascript and the (fairly uncomprehensive) unit tests.

https://github.com/rebcabin/underscore/commit/4f90bab75d8f69df8d8d0317354f9f5981e2fedb

michaelficarra commented 12 years ago

Can you explain a little more about how you're not looking for the equivalent to

_.flatMap = _.compose(_.flatten, _.map)

This is just Haskell's concatMap, which is just (concat . map).

rebcabin commented 12 years ago

The issue I found is that underscore's flatten flattens all levels of its input, whereas Haskell's concatMap flattens exactly one level of its input. This difference is critical, because it means that Haskell's concatMap is equivalent to a monadic bind, in the sense that concatMap someFunc someList equals someList >>= someFunc. This may seem like a nit or a feckless generalization, but without a proper flatMap (or concatMap, if you prefer), we have a non-monadic library of list operators and miss the mark of full compositionality. In particular, without it, we must leave the library to get the "list of all line items from customers" from composing an operation that gets "the list of all orders from customers" and an operation that gets "the list of all line items from orders." With a proper monadic flatMap (or concatMap), we just say

customers.flatMap(customer -> customer.Orders).flatMap(order -> order.lineItems)

If flatMap flattens too much, though, we can't get a list of orders out of it -- we've gone too far with the first flatMap(customer->customer.Orders) because it too aggressively flattens the substructure of lineItems.

A little monadic discipline goes a long way. Having a proper flatMap that flattens exactly one level makes the underscore library equivalent to Haskell's list monad or to .NET's LINQ over lists. It means the underscore library as a design is, in principle, extendable to other monads, most importantly to asynchronous streams of data. This ups its ante tremendously, in my opinion. Imagine a library with exactly the same operators as underscore's, but just works over asynchronous streams. This would be equivalent to .NET's reactive framework (detail: the reactive equivalent to a list is an observable that exposes a subscribe method, and observers subscribe their onNext, onError, and onCompletion callbacks to observables. The monadic library will allow us to filter, transform, expand, etc. the observables.)

(my remarks above are predicated on modeling the data as nested arrays -- as I noted before, if the data is modeled as alternating arrays and objects, then the composition of underscores flatten with map accidentally works because flatten won't cut through the next lower object layer -- the data model already presumes a list with one level of structure accessible to flatten. I reckon this to be a lucky accident that pertains only to one conventional way of modeling data.)

michaelficarra commented 12 years ago

The issue I found is that underscore's flatten flattens all levels of its input

Ah, I wasn't aware of that. Now I understand.

I don't like that _.flatten behaviour. I propose that _.flatten is made shallow by default so that _.flatMap can be implemented as I listed above. I know it will be a breaking change, but it was a mistake to implement it that way. For now, _.flatMap can be implemented as:

_.flattenOnce = function(arr){
  return _.reduce(arr, function(memo, b){ return memo.concat(b); }, []);
};
_.compose(_.flattenOnce, _.map);

rebcabin commented 12 years ago

There is a secret "second" argument of the original flatten that I exploited in my implementation of flatMap. I discovered it by looking at underscore's source. Take a peek at my fork if you have a sec :)

https://github.com/rebcabin/underscore/commit/4f90bab75d8f69df8d8d0317354f9f5981e2fedb

michaelficarra commented 12 years ago

Yep, I saw that when I looked at the source of _.flatten after you mentioned it performed a deep flatten. That's why I suggested that we make _.flatten shallow by default. I see that it has the parameter, but it defaults to deep flatten behaviour. I'd also be okay with defining an _.concat that behaves as _.flatten except with that second parameter defaulting to true.

rebcabin commented 12 years ago

Also note that Mathematica has a hyper-generic flatten function that permits full control over flattening at all levels. I do a lot of Mathematica work, and I often use the equivalents of flattenOnce and flattenAll. But it's occasionally nice to be able to reach in and manipulate internal levels. Probably too much for a utility-belt like underscore, but I do think that both flattenOnce and flattenAll are really useful.

http://reference.wolfram.com/mathematica/ref/Flatten.html

rebcabin commented 12 years ago

Ah, ok, I get you Michael. I would concur on changing the default of flatten if the breaking change isn't too heavy on users. That's a judgment call I can't make :)

jashkenas commented 12 years ago

I'm afraid that the breaking change would be too heavy -- Underscore is a fairly pervasive library, and I'd like to put a high premium on backwards compatibility.

... but how about simply documenting the second argument to flatten? Isn't that what we're looking for here?

jashkenas / underscore

recommend "flatmap" #436