hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
977 stars 244 forks source link

Linear Algebra Plan #5195

Closed danking closed 4 years ago

danking commented 5 years ago

--- DRAFT ---

People

@hail-is/tensors

Scope of Document

Python API, types, IR, optimizer, compiler

Notation

e[[x/v]] means substitute occurrences of the variable v with the expression x in the expression e

f[[ e ]] is just application of the function f but with the advantage that it doesn't look like any python or Scala syntax (so it's obviously referring to the meta-language rather than the languages we're building here)

I'm going to consistently use "distributed" to talk about BlockMatrix-y things and "small" to refer to things that live in the "value" IR.

DistributedTensorIR

Some thoughts on TensorIR (fruits of discussion among the group):

TensorIR ::= TensorLiteral()
           | TensorContract(TensorIR, TensorIR, Int, Int, body: IR)
           | TensorMap2(TensorIR, TensorIR, body: IR)
           | TensorMap(TensorIR, body: IR)
           | TensorSelectGlobals(TensorIR, body: IR)

In the body of TensorContract and TensorMap2, four refs are free: l, r, i, and j. In the body of TensorMap, three refs are free: e, i, j.

In the body of TensorContract, all four refs are aggregables. In the TensorMap and TensorMap2, they are scalar values.

No aggregations are allowed in the body of TensorSelectGlobals. It is just SparkContext.broadcast.

From Python

C[[ u @ v ]] := TensorContract(
                  C[[ u ]],
                  C[[ v ]],
                  1,
                  0,
                  hl.agg.sum(l * r))

C[[ u + v ]] := TensorMap2(
                  C[[ u ]],
                  C[[ v ]],
                  l + r)

C[[ u + 1 ]] := TensorMap(
                  C[[ u ]],
                  e + I32(1))

C[[ u + hl.ndarray(...) ]] := TensorMap(
                                TensorSelectGlobals(
                                  C[[ u ]],
                                  uuid1,
                                  C[[ hl.ndarray(...) ]])
                                e + NDArrayIndex(GetField("globals", uuid1), i, j))

Transformations

This representation admits elegant transformations:

TensorMap2(TensorMap(u, x), v, body)
  <=>
TensorMap2(u, v, Let(uuid1, x[l/e], body[Ref(uuid1)/l]))

TensorMap(TensorMap(u, x), y)
  <=>
TensorMap(u, Let(uuid1, x, y[Ref(uuid1)/e]))

TensorContraction(TensorMap(u, x), v, body)
  <=>
TensorContraction(u, v, Let(uuid1, x[l/e], body[Ref(uuid1)/l]))

the above rule needs care wrt aggregations, namely the let must be pushed under the aggregation, but no further.

All these rules need to be careful because we don't want to lose the ability to send something through BLAS. Perhaps these rules should be left entirely to the "pipeline"-level (see: Arcturus' recent work) optimizer (after translation to tables of small tensors is complete, at which point BLAS operations are explicit).

Compilation

At first, we pattern match the items that can be represented via BlockMatrix, err'ing on unrepresentable expressions.

C2[[ TensorMap(u, ApplyUnaryPrimOp(Plus(), Ref("e"), F64(n))) ]]
  =
u.scalarAdd(n)

C2[[ TensorMap2(u, v, ApplyBinaryPrimOp(Plus(), Ref("l"), Ref("r"))) ]]
  =
u.add(v)

C2[[ TensorContraction(u, v, (ApplyAggOp Sum () None ((ApplyBinaryPrimOp `*` (Ref l) (Ref r))))) ]]
  =
u.dot(v)
danking commented 5 years ago

@hail-is/tensors fyi

danking commented 4 years ago

out of date at this point cc: @johnc1231 this was some early thinking on this