typelevel / cats-collections

Data structures for pure functional programming in Scala
https://typelevel.org/cats-collections/
Other
556 stars 99 forks source link

Inconsistent Range behavior when dealing with reverse ranges #564

Open isomarcte opened 1 year ago

isomarcte commented 1 year ago

I've noticed some inconsistencies with how several methods on Range and that use Range are defined when dealing with ranges which go different directions.

Range.contains

contains for Range is not symmetric around reverse.

scala> val a = Range(1, 10)
a: cats.collections.Range[Int] = Range(1,10)

scala> val b = a.reverse
b: cats.collections.Range[Int] = Range(10,1)

scala> a.contains(b)
res0: Boolean = true

scala> b.contains(a)
res1: Boolean = false

Perhaps most troubling,

scala> Range(1, 10).contains(Range(15, 9))
res0: Boolean = true

This is made more confusing when looking at how Range is used with Diet.

scala> val a = Range(1, 10)
a: cats.collections.Range[Int] = Range(1,10)

scala> val b = a.reverse
b: cats.collections.Range[Int] = Range(10,1)

scala> Diet.fromRange(b) === Diet.fromRange(a)
res0: Boolean = false

scala> Diet.empty.addRange(b) === Diet.fromRange(a)
res1: Boolean = true

There are a few ways to remedy this that come to mind, though I think the most simple would be to change Range to be defined as always going in one direction. If a Range is desired that is going in the inverse order, then an newtype (perhaps Inverse) could be used.

satorg commented 1 year ago

I would say it is by design: Range defines boundaries but not specify Order upon creation. Then every single method has to provide the Order implicitly for itself. In particular, both the contains methods take the implicit Order parameter. Therefore, if we reverse Range then the Order used for its methods has also be reversed:

scala> val rng1 = Range(1, 10)
rng1: cats.collections.Range[Int] = Range(1,10)

scala> rng1.contains(5)
res0: Boolean = true

scala> val rng2 = rng1.reverse
rng2: cats.collections.Range[Int] = Range(10,1)

scala> rng2.contains(5) // won't work since it uses a regular Order for Int
res1: Boolean = false

scala> rng2.contains(5)(Order.reverse(Order[Int]))
res2: Boolean = true

That said, I personally find such an approach controversial quite a bit.

armanbilge commented 1 year ago

Range defines boundaries but not specify Order upon creation. Then every single method has to provide the Order implicitly for itself.

Yes, this is one of the things I think we should consider changing for 1.0. See my comment in https://github.com/typelevel/cats-collections/issues/277#issuecomment-1292455578.

Indeed, this design was rejected for the new HashMap and HashSet, on grounds of be a "Haskell cargo-cult". So at the very least, this library now has inconsistent designs.

satorg commented 1 year ago

this is one of the things I think we should consider changing for 1.0

If we could change that, then perhaps it would be a cool change. Would we consider making Range to be not "all-inclusive" only? I mean, if we could create a Range instance having one or both boundaries that would be exclusive, then it would make this type way more useful.

armanbilge commented 1 year ago

See also some discussion in https://github.com/typelevel/cats-collections/issues/315#issuecomment-770266231.

isomarcte commented 1 year ago

Just to clarify, are we saying that we intentionally want the usage of non-coherent instances to be normative? Like, we demand either Order[Int] or Order.reverse(Order[Int]) on Range creation as a design feature to change the ordering of the Range?

Using HashSet as an example, is this an intended usage?

scala> val sillyButValidHash: Hash[Int] = new Hash[Int] { override def hash(a: Int): Int = 1; override def eqv(x: Int, y: Int): Boolean = Eq[Int].eqv(x, y)}
sillyButValidHash: cats.Hash[Int] = $anon$1@1f21f982

scala> HashSet(1) === HashSet(1)(sillyButValidHash)
res0: Boolean = true

scala> HashSet(1).add(2) === HashSet(1)(sillyButValidHash).add(2)
res1: Boolean = false

scala> HashSet(1).add(2).iterator.toList === HashSet(1)(sillyButValidHash).add(2).iterator.toList
res2: Boolean = true

I personally would never want this to be intended behavior, as if I have more than one instance of HashSet[A] I completely lose the ability to reason about how they relate to each other, unless I've defined them in the same local scope.

I was under the impression that binding the instance at the method or at the class was mostly for convenience (and preventing damage to internal state of an already allocated structure) and not as a mechanism for using alternative instances. I would expect to use newtypes (ideally opaque types) to index alternative implementations.

armanbilge commented 1 year ago

The problem is if you take the instance at the call site then you can do this:

HashSet(1).add(2)(sillyButValidHash)

Which is also bad. Oscar just summarized this nicely in https://github.com/typelevel/cats-collections/issues/277#issuecomment-1315664123.

I still believe that in a language like scala where typeclasses are first class values and we don't have coherence, it doesn't make since to consume required typeclasses at callsites if using the wrong one invalidates the datastructure (which it does for hashmaps and trees with Hash and Order respectively).

armanbilge commented 1 year ago

I would expect to use newtypes (ideally opaque types) to index alternative implementations.

👍 yes, this is the only safe way.

isomarcte commented 1 year ago

:+1: You scared me for a minute.

So coming back to Range. If we define ranges as intervals going in a single direction, then I think we can make the interface consistent and relatively easy to use. This would require the use of a newtype for reversing direction. I'm happy to POC that if there are no objections. It would also make the case for moving Inverse to cats-collections more compelling.

My intended strategy would probably be to introduce a new class Interval and deprecate Range.

Thoughts?

armanbilge commented 1 year ago

Sure, maybe Interval is a better name for the concept anyway.

satorg commented 1 year ago

@isomarcte I think there are two strategies possible:

  1. As you suggested, to make a type that always uses a single direction and then create a newtype for the inverse directions.
    • PROS: (as I can see it) the direction is unified and alway apparent.
    • CONS: the range type and its inverse counterpart become two different types and thus cannot be compared to each other directly. I.e. we have to either implement separate methods for correlating those types or make a conversion before each comparison.
  2. Another way could be to keep Range being bi-directional but latching its Order upon a new instance creation. Therefore, when we call .reverse we can also reverse the Order.
    • PROS: a single type that is "up" to be compared with a counterpart regardless of each other's direction.
    • CONS: the Range becomes more complex and has to maintain more complex structure rather than just boundaries.

Also as I mentioned above, I personally would find useful to have Range that could have non-inclusive boundaries, e.g. [0..10) which is similar to Scala's 0 until 10 .

All above is my personal perception of course.

armanbilge commented 1 year ago

Thanks for that distinction, at least for me I think that helped clarify things.

I think (1) is an "interval": it is a set of things, in some natural order. Whereas (2) is a "range" since you are not only specifying the set, but in which order to iterate it. I am not sure if that means we have to reverse the Order instance: the direction is a property of the "range", and not of its element. I think :)

satorg commented 1 year ago

I am not sure if that means we have to reverse the Order instance: the direction is a property of the "range", and not of its element.

In theory yes, it should not be the property of Range. But the way it is currently implemented is that Order is used to correlate the Range's boundaries with other items of the same type (including boundaries of another Range). Therefore, currently Order magically becomes a property of the Range itself. And yeah, I agree it should not be so...

isomarcte commented 1 year ago

@satorg RE 1.CONS

Assuming the inclusion of Inverse from the aforementioned PR, do you think something along these lines would work?

val a: Interval[Int] = Interval(1, 10)
val b: Interval[Inverse[Int]] = a.reverse

def reverse: Interval[Inverse[A]] = 
  Interval(Inverse(end), Inverse(start))
def contains(value: Interval[A]): Boolean = ???
def containsInverse(value: Interval[Inverse[A]]): Boolean = 
  contains(value.reverse)

// Just a rough sketch. I might be able to do something interesting so that
// (a: Interval[Inverse[A]]).reverse => Interval[A], and not
// Interval[Inverse[Inverse[A]]]
:t reverse 
Interval[A] => Interval[Inverse[A]]
:t contains 
Interval[A] => Interval[A] => Boolean
:t containsInverse
Interval[A] => Interval[Inverse[A]] => Boolean

And I'm completely fine making it exclusive on the upper bound. I think that would even be required, since if we require that start <= end, then the only want to define the empty interval is to have the upper bound be exclusive.

isomarcte commented 1 year ago

I can also just make a POC PR as well, if that would be easier for discussion.

satorg commented 1 year ago

And I'm completely fine making it exclusive on the upper bound. I think that would even be required, since if we require that start <= end, then the only want to define the empty interval is to have the upper bound be exclusive.

Yes, I think it makes sense as an option... But on the other hand.... Imagine that we need to make Interval for the entire Short type range. In that case Interval(Short.MinVal, Short.MaxVal) wouldn't work since the last value will be always excluded.

Another (but similar) example:

sealed trait MyNum

object MyNum {
  object One extends MyNum
  object Two extends MyNum
  object Three extends MyNum
  object Four extends MyNum
}

Apparently we can define some Order for such an enum as well as Discrete (or perhaps even BoundedEnumerable from cats) for that. But if Interval can only have an exclusive upper bound, then there's no way to define such an Interval that would contain all the members of an enum like one above.

satorg commented 1 year ago

As a bottomline: there's a controverity: if Interval only allows both lower and upper bounds to be inclusive, then there's no way to define an empty Interval. Otherwise, if it always defined as [inclusive, exclusive), then we cannot describe a full range of some limited set of values.

This is why I think it should allow both options.

satorg commented 1 year ago

On the second thought... Perhaps, the idea is too drastic, but... is there really a necessity for having classes like Range or Interval in principle, taking into account that there are cool typeclasses like UnboundedEnumerable, LowerBoundedEnumerable, BoundedEnumerable, etc already defined in cats?

isomarcte commented 1 year ago

@satorg I would say yes. An Interval type, not only providing some utility itself, is fundamental to a class of data structures called Interval Trees, of which we have a variant in cats-collection already, e.g. Diet.

Having a base type with the correct semantics to implement these is, I think, quite useful.

I am of course heavily biased here. I started down this path because a nice Interval type and an Interval Tree which supports key value pairs would be amazing for my work in idna4s. :wink:

satorg commented 1 year ago

I basically agree. But what I am thinking about (and yeah, perhaps it is too crazy) is to make Diet more generic and don't stick to a particular interval/range type. Therefore if we made implementation of Diet based on typeclasses rather than a particular range implementation, then it should be possible to plug any range implementation into it, including scala.collection.immutable.Range, something like

class Diet[A, R] {}

where A – element type and R – range type