scalanlp / breeze

Breeze is a numerical processing library for Scala.
www.scalanlp.org
Apache License 2.0
3.44k stars 692 forks source link

Optionally Encode Dimension in Types #526

Open kailuowang opened 8 years ago

kailuowang commented 8 years ago

One thing I found tedious is the runtime check the dimensionality match, for example, when writing a neural network, if you want to be safe, you need to check if the dimensionality of each layer matches, and when writing a single layer, you need to check if the multiple components of that layer have matching dimensions.

I am wondering if it's possible to encode the dimension in type so that such runtime check can be compiler time check? For example, I imagine it's possible to use Shapeless.Nat something like Matrix with D(_2, _3).

Perhaps this has been considered and deemed infeasible?

kailuowang commented 8 years ago

another example, https://github.com/scalanlp/breeze/issues/515 this could fail at compilation time.

dlwh commented 8 years ago

yeah it'd be nice to do and I've sketched idea for how to do it.

You probably don't want shapeless nat's (since network dimensions are usually dynamic), but probably something that wraps a pair of ints and brokers matrices

On Mon, Apr 18, 2016 at 10:28 AM, Kailuo Wang notifications@github.com wrote:

another example, #515 https://github.com/scalanlp/breeze/issues/515 this could fail at compilation time.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/scalanlp/breeze/issues/526#issuecomment-211487054

kailuowang commented 8 years ago

@dlwh do you mind expand it a bit more? I am very interested in contributing on this front, need a bit more instructions though.

fommil commented 8 years ago

:+1: for singleton types (aka refinement)

dlwh commented 8 years ago

sure thing. I sketched this a long time ago, let me know if it makes sense:

  case class Dim(size: Int)

  trait Matrix[X <: Dim with Singleton, Y <: Dim with Singleton] {
    def *[Z <: Dim with Singleton](other: Matrix[Y, Z]):Matrix[X, Z] = ???
  }

  object Matrix {
    def apply[X <: Dim with Singleton, Y <: Dim with Singleton](x: X, y:
Y):Matrix[X,Y] = ???
  }

  val x = Dim(100)
  val y = Dim(readFromFile(...))
  val z = Dim(37)

  val A = Matrix[x.type, y.type](x, y)
  val B = Matrix[y.type, z.type](y, z)

  A * A // compile error
  A * B // ok

On Mon, Apr 18, 2016 at 11:00 AM, Kailuo Wang notifications@github.com wrote:

@dlwh https://github.com/dlwh do you mind expand it a bit more? I am very interested in contributing on this front, need a bit more instructions though.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/scalanlp/breeze/issues/526#issuecomment-211504238

fommil commented 8 years ago

You might benefit from watching both my ncala exchange talks: first linear algebra, then shapeless for mortals :smile:

dlwh commented 8 years ago

i also have something a bit less sketch-y on my home computer that I can try to dig up. It basically was intended as a cleanup/refactor/repurposing of the breeze.util.Index/Encoder stuff

On Mon, Apr 18, 2016 at 11:03 AM, David Hall david.lw.hall@gmail.com wrote:

sure thing. I sketched this a long time ago, let me know if it makes sense:

  case class Dim(size: Int)

  trait Matrix[X <: Dim with Singleton, Y <: Dim with Singleton] {
    def *[Z <: Dim with Singleton](other: Matrix[Y, Z]):Matrix[X, Z] = ???
  }

  object Matrix {
    def apply[X <: Dim with Singleton, Y <: Dim with Singleton](x: X, y:
Y):Matrix[X,Y] = ???
  }

  val x = Dim(100)
  val y = Dim(readFromFile(...))
  val z = Dim(37)

  val A = Matrix[x.type, y.type](x, y)
  val B = Matrix[y.type, z.type](y, z)

  A * A // compile error
  A * B // ok

On Mon, Apr 18, 2016 at 11:00 AM, Kailuo Wang notifications@github.com wrote:

@dlwh https://github.com/dlwh do you mind expand it a bit more? I am very interested in contributing on this front, need a bit more instructions though.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/scalanlp/breeze/issues/526#issuecomment-211504238

fommil commented 8 years ago

Shapeless' refined types might be cleaner than having to introduce Dim

fommil commented 8 years ago

Idris has a really neat way of handling this sort of thing.

kailuowang commented 8 years ago

@fommil thanks for the hint, I've watched the shapeless for mortals, will check out the linear algebra one.
@dlwh that makes sense. Looks a great starting point to me. Pinging @milessabin and @fthomas to see if you guys have more ideas to bring to the table.

dlwh commented 8 years ago

(I'm sure miles solved this better than I will. Definitely see what you can scavenge from shapeless)

kailuowang commented 8 years ago

@dlwh I am going to spend sometime experimenting ideas. Right now, i have several hypotheses to test: a) to really take advantage of compiler time dimension check, you will need to do some arithmetics on the type level, which makes support for dynamic values impossible. b) dimension information can be an optional tagged type to the tensor type (rather than required type parameters). This can still be beneficial (I am thinking about refined) and possibly make it easier to work with dynamic dimensions. I imagine we can provide two different typeclasses for matrix operations, one set supporting compilation time dimension checks, the other one doesn't.

That being said, I also think I need to get more extensive experience with Breeze before I can evaluate the different strategies here. To achieve that my plan is to first change my project glaux to switch to use Breeze instead of ND4j ( I thought I need N-Dimension tensor but maybe I don't).

dlwh commented 8 years ago

I strongly disagree with (a), since you aren't going to do that sophisticated of transforms, and you really want to be able to specify dimensions of the nets at runtime while still maintaining static guarantees

kailuowang commented 8 years ago

@dlwh just curious, in regards to specifying dimensions at runtime, is that needed by auto tuning? any other example so that I can better grasp the problem? One thing I am trying to figure out with the Dim(x: Int) design, how to share the same Dim instance across multiple components that use that Dim (since two Dim(2) are not the same in singleton type) . Are you thinking about global DimensionSpec object in which all dimensions are specified?

denisrosset commented 8 years ago

you will need to do some arithmetics on the type level, which makes support for dynamic values impossible.

(replying to this, and in a sense continuing the discussion from typelevel/general)

I guess that's where the limitations of the Scala type system come into play. Otherwise, you would need to keep witnesses around that "d1 + d2 = d3" when concatenating vectors of dimension "d1" and "d2", so that you can add the resulting vector with another vector of dimension "d3".

Let's suppose you do that. Without support from the compiler to remove the boilerplate, writing linear algebra code is going to get pretty hairy.

In a sense, that's like integer division by 0. We could encode instances of non-negative integers using value classes, and enforce type safety. In practice, I've never been bitten by ArithmeticException.

For me, that's one of the hardest design decisions in Scala: when should a property be encoded in the type system? For example, should we encode the fact that a matrix is full rank to restrict the usage of mat.inverse? That the matrix is symmetric, so that eigenvalues can be computed? How do we deal with the resulting exponential explosion of subtypes?

mardukbp commented 4 years ago

This issue is one of the design concerns of Nexus - a prototypical typesafe deep learning system in Scala.