marstr / collection

Generic Golang implementation of a few basic data structures.
MIT License
23 stars 2 forks source link

Use an Interface Instead of Channels for Enumerators #21

Open marstr opened 2 years ago

marstr commented 2 years ago

I wrote this library a few weeks after having written my first line of Go. I was very excited to play around with channels, and to use them to build concurrent pipelines that would allow people to easily write fast, organized, parallelized code. Right away, I used it to make a bunch of build tools for the Azure SDK for Go, and saw huge success with the approach. The time it took to build all of our packages from Open API specs dropped more than 50%, and I knew I was onto something. But almost as quickly, the flaws of this approach started showing themselves. Within a week, I was laying in bed one night and thought, "wait, what happens when I call collection.Any(...)? It starts an enumeration but doesn't finish it. How does the publishing goroutine know to stop?" Spoiler, it didn't. I went into the office, wrote a program that called collection.Any in a loop and it crashed my machine. Remember, these were my first days with Go (Go1.5 timeframe), and the context.Context pattern wasn't really around/popular yet. It was still early days, and I was confident I was the only one using this library, so I fixed the problem by making a breaking change by adding a <-chan struct{} to provide a cancellation signal and kept moving.

A little while later, a new problem presented itself: how do you handle errors? You could add an error to Enumerate(...)'s return signature. But that only has the capacity to indicate errors that happened right away as you are establishing the enumerator, not errors the crop up while the pipeline is processing. One could add a new struct like type EnumerationEntry struct { Payload interface{}, Err error} and use it in place of everywhere interface{} was being used. But then the whole thing is starting to feel sloppy and unwieldy. At the same time, the places where it would matter were becoming clear. What if you were wrapping a Web API that returns results in pages, and wanted to hide that complexity with an Enumerator? Every time you cross a page boundary you really are fairly likely to encounter a terminal error. But none of the places I was actually using enumerators were prone to this problem. And hugely benefited from the parallelism that my approach offered. Building pipelines to process data that's already in memory, or even just spinning off local processes was not prone to failure. Additionally, it was becoming less clear that I was the only user, as I was seeing a lot of clones of this repository (which I still see, thank you.) Without go modules, or even popular adoption of forebearers like godep, glide, etc I didn't feel like I could make a breaking changes. So I just decided I didn't have the tools to make a real change, and kept with what I had.

The world of Go has changed a lot since the initial development of this library. With go modules, I have the ability to occasionally publish breaking changes in a fairly responsible way. People I respect a lot, like Brian Ketelsen, have affirmed the problems above by saying to my face, "never return channels." So, now that Go is adding generics it's felt like a great opportunity to release a v2 and clean up some of the things that have been dogging this library, and me.

So the question is:

In V2, should the current Enumerator type:

https://github.com/marstr/collection/blob/62813a03d62ace37f2ca51247f301ffab3c5860d/query.go#L17

be updated to something like:

type Enumerator[T any] interface{
    Next(ctx context.Context) (T, error)
}

The big pros:

The cons:

Are there ways we could mitigate some of the cons by adding helper functions?

marstr commented 2 years ago

For v2.0.0, this question should at least be answered; knowing that if the current channel based approach is kept, there shouldn't be a v3 published for years.