typelevel / algebra

Experimental project to lay out basic algebra type classes
https://typelevel.org/algebra/
Other
378 stars 69 forks source link

"Vector" instances are questionable #148

Closed johnynek closed 7 years ago

johnynek commented 8 years ago

@non @tixxit and I were talking about mapVectorEq and for instance the Group on Map[K, V] that assumes that a missing key is the same as Group.empty.

This has some issues. Without the correct Eq[Map[K, V]] in scope, you violate the laws. Moreover, the standard Eq should not be one that has this behavior, because that does not make sense for a lot of use cases.

Our solution is to wrap such instances:

// infinite map defined at all keys K
trait SparseMap[K, V] {
  def get(k: K): V
  def updated(k: K, v: V): SparseMap[K, V]
  def toMap: Map[K, V]
}

// an infinite vector that is defined at all points (i: Int)
trait SparseVector[T] {
  def apply(i: Int): T
  def updated(i: Int, v: T): SparseVector[T]
  def toVector: Vector[T]
}

In this way #126 would be an instance on SparseMap

TomasMikula commented 8 years ago

The solution by defining a new type for these versions of Map and Vector sounds good. What is the motivation behind naming them Vector*?

adelbertc commented 8 years ago

@TomasMikula I believe the Vector prefix is in reference to vector equality like in linear algebra

TomasMikula commented 8 years ago

Oh, I see, but you mean infinite-dimensional vectors, right? Then a more descriptive name would be Infinite{Vector|Map} instead of Vector{Vector|Map}, wouldn't it?

tixxit commented 8 years ago

I think @johnynek mentioned SparseMap as a possibility, which is descriptive too. In the case of Vector it's "really" a vector with Int.MaxValue elements too :\

TomasMikula commented 8 years ago

"Sparse" sounds good!

adelbertc commented 8 years ago

👍 Sparse

rklaehn commented 8 years ago

On the issue of names: I got a library https://github.com/rklaehn/abc where I have implementations of Seq and Map that allow defining a lot of typeclass instances without having a strange Eq. I called them TotalArrayMap / TotalArraySeq. Whether they are sparse or not is an implementation detail IMHO. The important thing is that in both cases the apply method is total, because there is a default value in case no mapping is provided. https://github.com/rklaehn/abc#-totalarraymapk-v

But sparse is also fine. Just pease don't have a strange Eq on the default implicit scope just to make normal map work...

TomasMikula commented 8 years ago

So, what would be the requirements on the "default" value in the sparse vector/map?

  1. No requirements. It can be any value, and is provided at map/vector creation time.
  2. It is a "zero" of some algebraic structure, such as Monoid zero, or Lattice bottom.

I see @rklaehn does 1. I see a danger of accidentally getting an instance of, say, Monoid[SparseMap[K, V]] (given Monoid[V]), such that the map's default value is not the same as Monoid[V].zero.

The problem of 2. is: which zero should it be? Monoid zero? Lattice bottom? Monoid might seem the most general requirement: lattice bottom and the join operation form a monoid; but then so do lattice top and the meet operation. And neither might be the most natural monoid on V.

rklaehn commented 8 years ago

The empty value of a Monoid[TotalArrayMap[K, V]] is a TotalArrayMap with no elements and the default value of Monoid[V].empty. So unless you have different Monoid instances in scope at different places, everything is consistent. See https://github.com/rklaehn/abc/blob/master/core/src/main/scala/com/rklaehn/abc/TotalArrayMap.scala#L147

TomasMikula commented 8 years ago

If you have m1, m2: TotalArrayMap[K, V] with some default value other than Monoid[V].empty, and then combine them using Monoid[TotalArrayMap[K, V]].combine, what is the default value of the result? (I know it is well defined and I can see from the source code that it is the default value of m1.) My point is, I probably never wanted to use a monoid with a different empty value than my map's default value. So I now have a bug in my code that is hard to spot.

rklaehn commented 8 years ago

The default value of the result will be result.default === Monoid[V].combine(m1.default, m2.default). I don't see how it could be any other way, because for every k, result(k) === Monoid[V].combine(m1(k), m2(k)) must be true.

There is a "fast path" for when the default of both m1 and m2 is Monoid[V].empty. In that case you do not have to perform the combine op at all for values that aren't in both maps. But that is just an implementation detail.

import com.rklaehn.abc._
import cats.implicits._
import algebra._

val a = TotalArrayMap.fromDefault[Int,Int](1)
val b = TotalArrayMap.fromDefault[Int,Int](2)

scala> Monoid[TotalArrayMap[Int, Int]].combine(a, b).default
res3: Int = 3
TomasMikula commented 8 years ago

I shouldn't embarrass myself by making false statements about code I haven't fully read :) Yeah, how you handle that situation seems the only correct way.

Still, the concern about accidentally combining maps with different default values remains. (Not that I have a better solution. I'm just trying to understand potential dangers of either approach.)

johnynek commented 8 years ago

I think doing something like this would be the way to go. TotalMap TotalVector seems fine to me.

I agree that Sparse seems to imply the default value is empty but that ties creation of these things to a Monoid which might not be ideal for reasons mentioned.

I don't think you have a problem if each instance of the class carries their own default value (and it won't cost much in storage, since it is just one more value).

TomasMikula commented 8 years ago

The problem can occur precisely when each instance carries their own default value, and you accidentally combine maps with different default values (unless you prohibit that at runtime).

Ultimately, the problem with both approaches is kind of the same:

  1. possibility to accidentally use different default values;
  2. possibility to accidentally use different type class instances for the same type (and thus different default values).

Maybe a good compromise would be Monoid[SparseMap[K, V]] throwing a runtime exception when you try to combine maps with different default values, while using SparseMap[K, V]#combine(rhs, f) directly would still allow you to do that. Thoughts?

SparseMap[K, V]#combine(rhs, f) signature could even be

def combine[U, W](rhs: SparseMap[K, U], f: (V, U) ⇒ W): SparseMap[K, W]

(changing the value type to W), so changing the default value must stay allowed there.

johnynek commented 8 years ago

I'm confused, I am proposing:

trait TotalMap[K, V] {
  def nonDefaultKeys: Set[K] // maybe private[algebra]
  def apply(k: K): V
  def update(k: K, v: V): TotalMap[K, V]
}

object TotalMap {
  // default is embedded in TotalMap, not in any algebra instance.
  def empty[K, V](default: V): TotalMap[K, V] = ...
}

So, what is the risk? when you combine you do:

def combine(a: TotalMap[K, V], b: TotalMap[K, V]): TotalMap[K, V] = {
  val res = TotalMap.empty[K, V](Monoid[V].empty)
  a.nonDefaultKeys.union(b.nonDefaultKeys).foldLeft(res) { (old, k) =>
    old.update(k, a(k) + b(k))
  }
}

sorry if I'm being dense, but is there a risk there?

TomasMikula commented 8 years ago

Suppose

a.defaultValue + b.defaultValue != Monoid[V].empty

Pick a key k that is absent from both a and b. One would expect

combine(a, b)(k) == a(k) + b(k)
                 == a.defaultValue + b.defaultValue

but instead it is

combine(a, b)(k) == Monoid[V].empty
                 != a.defaultValue + b.defaultValue
johnynek commented 8 years ago

@TomasMikula indeed, thanks, so fixing it to:

trait TotalMap[K, V] {
  def nonDefaultKeys: Set[K] // maybe private[algebra]
  def defaultValue: V // maybe private[algebra]
  def apply(k: K): V
  def update(k: K, v: V): TotalMap[K, V]
}

object TotalMap {
  // default is embedded in TotalMap, not in any algebra instance.
  def empty[K, V](default: V): TotalMap[K, V] = ...
}

def combine(a: TotalMap[K, V], b: TotalMap[K, V]): TotalMap[K, V] = {
  val res = TotalMap.empty[K, V](a.defaultValue + b.defaultValue)
  a.nonDefaultKeys.union(b.nonDefaultKeys).foldLeft(res) { (old, k) =>
    old.update(k, a(k) + b(k))
  }
}

that works, right?

TomasMikula commented 8 years ago

That's basically what @rklaehn does in his library. I find this correct. Note, however, that there is no Monoid (or BoundedJoinSemilattice) involved yet, here. If this was the behavior of Monoid[TotalMap[K, V]].combine, it could be a nasty surprise that the default value of the resulting map is not Monoid[V].empty. That's why I suggested that if we were to provide Monoid instance for TotalMap, that monoid's combine method should check (at runtime) that the defaultValues of the operands are both Monoid[V].empty.

rklaehn commented 8 years ago

@TomasMikula

  1. possibility to accidentally use different type class instances for the same type (and thus different default values).

If you ever do that in a scala program, you are basically in hell. E.g. you build a RedBlackTree using one order, and then try to get values out of it while another order is in the implicit scope. Capturing typeclass instances at creation is also no solution, because then you lose most of the advantages of a typeclass-based design.

This is a problem with the lack of typeclass coherence. See https://www.youtube.com/watch?v=hIZxTQP1ifo

Maybe a good compromise would be Monoid[SparseMap[K, V]] throwing a runtime exception when you try to combine maps with different default values, while using SparseMap[K, V]#combine(rhs, f) directly would still allow you to do that. Thoughts?

I want all methods on my data structures to be total. So throwing an exception is not ever a good solution for me.

Suppose

a.defaultValue + b.defaultValue != Monoid[V].empty Pick a key k that is absent from both a and b. One would expect

combine(a, b)(k) == a(k) + b(k) == a.defaultValue + b.defaultValue but instead it is

combine(a, b)(k) == Monoid[V].empty != a.defaultValue + b.defaultValue

No, the result will be whatever the operation is applied to the default values. So if you add two maps, the default values will also be added. Everything's fine. combine in my code is not necessarily the monoid combine, but whatever the current op is.

Basically a total map is just a tabulated function. Once it is created, the fact that it is backed by a finite number of mappings is an implementation detail. It must compose like a function, so

a(k) op b(k) === (a op b)(k) for all k (including those k where neither a nor b have keys)

TomasMikula commented 8 years ago

No, the result will be whatever the operation is applied to the default values.

In your library, yes. My comment was about @johnynek's code posted above.

non commented 7 years ago

When we moved to cats-kernel we removed these instances.