typelevel / cats-collections

Data structures for pure functional programming in Scala
https://typelevel.org/cats-collections/
Other
556 stars 98 forks source link

A purely functional Set implementation using `Eq` #142

Open nasadorian opened 5 years ago

nasadorian commented 5 years ago

I noticed that the distinct implementations in cats data types are using on Scala's TreeSet and the Order typeclass. But what about a purely functional set based on Eq using only boolean logic?

Upsides:

Downsides:

I wrote up a basic implementation (https://gist.github.com/nasadorian/8f10cc1f89225a932238026cf46e57a7) using just a case class and Eq.

Would like to improve it and use more lawful constructs - I am thinking a better impl could depend on some Contravariant Functor like Predicate since it is "containing" a function, but I'm not totally sure how to make that work.

The Set itself could have Monoid and Semigroup instances easily. I would also like to figure out if it's possible to make it a Functor.

Cheers!

larsrh commented 5 years ago

That should basically be ISet. Making it a functor wouldn't work, however.

nasadorian commented 5 years ago

Thanks @larsrh - looks like I'm a few years late to the party 😅. I do have some improvements I think I could make in a PR.

ISet seems to be based on a generic predicate. Would the special case based on Eq (which I'd assume is most common anyway) belong in the same library or is that too trivial?

nasadorian commented 5 years ago

Also interestingly, it seems neither ISet nor my implementation are stack safe after a certain size. Found that ISet stack overflows when looking up in a set of around 20k elements.

larsrh commented 5 years ago

It depends on how you construct such a set. One possibility to make it stack-safe would be to use a function A => cats.Eval[Boolean] instead of A => Boolean.

LukaJCB commented 5 years ago

I think Eval[A => Boolean] would be even better (similar to what we did for State) as function composition isn't stack safe in general either

nasadorian commented 5 years ago

I did try it with Eval, but ended up with a deferred version of the same SO.

What I'm starting to realize is... maybe this structure actually has O(n) lookup time instead of constant as one would expect. Correct me if I'm wrong here...

Every time I wrap a function in another function like in Option((_:Int) == 1).map(f => (n: Int) => f(n) || n == 2), we create a new Function1 which references the first Function1 in its apply. So there are N objects being created for an N-element functional set, and when we call contains, Scala is pushing up to N frames onto the stack where we really just want to call a single, flat function.

How would we resolve this with a trampoline if the inevitable act of combining the functions is creating this nesting?