haskell-streaming / streaming

An optimized general monad transformer for streaming applications, with a simple prelude of functions
BSD 3-Clause "New" or "Revised" License
156 stars 30 forks source link

Incorrect documentation in unzip #97

Open chowells79 opened 4 years ago

chowells79 commented 4 years ago

https://hackage.haskell.org/package/streaming-0.2.3.0/docs/Streaming-Prelude.html#v:unzip claims

Of course, Data.List unzip doesn't stream either.

But that isn't true.

Prelude Data.Bifunctor> let x = zip [1..] [5..]
Prelude Data.Bifunctor> take 5 x
[(1,5),(2,6),(3,7),(4,8),(5,9)]
Prelude Data.Bifunctor> bimap (take 4) (take 6) $ unzip x
([1,2,3,4],[5,6,7,8,9,10])

That's streaming by any definition I can think of...

treeowl commented 4 years ago

Can you clarify? I believe the issue it refers to is that the two result lists may not be contained at the same rate, and therefore one of them may accumulate in memory.

chowells79 commented 4 years ago

That may be what it wants to say, but it's saying it in comparison to a type (Stream (Of a) m (Stream (Of b) m r)) that requires you to traverse the entirety of one output stream before you can even start traversing the other. Data.List.unzip, by contrast, works just fine with infinite inputs and lets you consume as much of either output as desired.

The fact that Stream (Of a) (Stream (Of b) m) r requires lockstep consumption is interesting, and is worth calling it out. Describing the output of Data.List.unzip as having similar issues to Stream (Of a) m (Stream (Of b) m r) is pretty much wrong, though.

Bonus report:

  unzip = unzips . maps (((a,b) :> x) -> Compose (a :> b :> x))
  unzip = expand $ p ((a,b) :> abs) -> b :> p (a :> abs)

Somehow lost its lambdas in haddock rendering. They're definitely there in the source...

danidiaz commented 4 years ago

Perhaps what was meant is that, if you want to process all the as with a single operation—say S.mapM_ putStrLn—and all the bs with another operation, you can't do it in Stream (Of a) m (Stream (Of b) m r) without accumulating bs in memory, and neither you can do it with the result of a Data.List.unzip.

It's true that with Data.List.unzip you can devise some "intertwined" mode of consumption that doesn't accumulate bs. But you can't do it with the naive method of applying traverse_ putStrLn on one list, then traverse_ putStrLn on the other.