[SR-9345] [AD]ventures in differentiating Tensor.sum (IRGen also involved)

swift-ci commented 5 years ago


Previous ID	SR-9345
Radar	None
Original Reporter	jekbradbury (JIRA User)
Type	Bug
Status	Resolved
Resolution	Done

Additional Detail from JIRA

| | | |------------------|-----------------| |Votes | 0 | |Component/s | Swift for TensorFlow | |Labels | Bug | |Assignee | @rxwei | |Priority | Medium | md5: 134b5cc8880169f5168a64c23d4a1b54

Issue Description:

Say I want to differentiate a composite function that includes a sum over a tensor, e.g.:

import TensorFlow
#gradient({ x in x.sum(squeezingAxes: 0) + Tensor(Float(2)) })(Tensor([Float(10)]))

The standard library doesn't have a primitive for sum, so let's add one:

public extension Tensor where Scalar: Numeric {
  @inlinable @inline(__always)
  @differentiable(reverse, wrt: (self), adjoint: _adjointSumToScalar)
  func sum() -> Scalar {
    let axes = Tensor<Int32>(rangeFrom: 0, to: rank, stride: 1)
    return Scalar._getScalarOrDie(Raw.sum(self, reductionIndices: axes).handle)
  }
  @inlinable @inline(__always)
  @differentiable(reverse, wrt: (self), adjoint: _adjointSumSqueezingAxes)
  func sum(squeezingAxes axes: Int32...) -> Tensor {
    return Raw.sum(self, reductionIndices: Tensor<Int32>(axes), keepDims: false)
  }
}
extension Tensor where Scalar: Numeric {
  @inlinable
  func _adjointSumToScalar(originalValue: Scalar, seed: Scalar) -> Tensor {
    let seedAsTensor = Tensor<Scalar>(seed)
    let ret = seedAsTensor.broadcast(like: self)
    return ret
  }
  @inlinable
  func _adjointSumSqueezingAxes(axes: Int32..., originalValue: Tensor, seed: Tensor) -> Tensor {
    let ret = seed.broadcast(like: self)
    return ret
  }
}

Both with and without tf-dynamic-compilation, this gives:

<unknown>:0: note: differentiating generic functions is not supported yet
<unknown>:0: error: function is not differentiable

This is a little strange, since I thought the problem was that AD doesn't work on generic functions yet, and that this should work since sum has a primitive adjoint. Before we try a concretization, let's go through a few variations (that I actually attempted first). If we use a user-defined function with custom adjoint rather than +, like this:

@differentiable(reverse, adjoint: adjointPlus)
func plus(_ x: Tensor<Float>, _ y: Tensor<Float>) -> Tensor<Float>{
    print("plus: x=\(x)")
    print("plus: y=\(y)")
    return x + y
}
func adjointPlus(
_ x: Tensor<Float>, _ y: Tensor<Float>, originalValue: Tensor<Float>, seed: Tensor<Float>
) -> (Tensor<Float>, Tensor<Float>) {
    print("adjointPlus: x=\(x)")
    print("adjointPlus: y=\(y)")
    return (seed, seed)
}

#gradient({ x in plus(x.sum(squeezingAxes: 0), Tensor(Float(2))) })(Tensor([Float(10)]))

then there's a segfault in the adjoint (again for both dynamic and graph mode):

plus: x=10.0
plus: y=2.0
Segmentation fault

In particular, it looks like the runtime is trying to use an integer as a TensorHandle (see the 0x...20):

frame #&#8203;1: 0x00007ffff7ed6539 libswiftTensorFlow.so`ShapedArray<A>.init(cTensorHandle=0x0000000000000020) at TensorHandle.swift:204:38                                                                   
   201      // NOTE: This will not perform a copy if the handle is already on the host.
   202      let context = _ExecutionContext.global                                                                                                    
   203      debugLog("Calling TFE_TensorHandleCopyToDevice().")                                                                                       
-> 204      let hostHandle: CTensorHandle! = TFE_TensorHandleCopyToDevice(                                                                               
   205        cTensorHandle, context.eagerContext, context.cpuDeviceName, status)
   206      checkOk(status)                                                                                                                                                                         
   207      internalConsistencyCheck(hostHandle != nil,

If we put the same code for sum and its adjoints in the standard library instead of user code, then adjointPlus gets the wrong inputs:

plus: x=10.0
plus: y=2.0
adjointPlus: x=0.0
adjointPlus: y=0.0

I haven't been able to reproduce this without involving standard library changes, and it also only occurs without tf-dynamic-compilation (turning that on gives the segfault from above).

If we instead try a concrete sum function:

@differentiable(reverse, adjoint: adjointSum)
func sum(_ x: Tensor<Float>) -> Tensor<Float> {
    let axes = Tensor<Int32>(rangeFrom: 0, to: x.rank, stride: 1)
    return Raw.sum(x, reductionIndices: axes, keepDims: false)
}
func adjointSum(_ x: Tensor<Float>, originalValue: Tensor<Float>, seed: Tensor<Float>) -> Tensor<Float> {
    return seed.broadcast(like: x)
}

#gradient({ x in plus(sum(x), Tensor(Float(2))) })(Tensor([Float(10)]))

then compiling with dynamic compilation gives the segfault and without gives

...
Calling TFE_NewOp(ctx, op, status) with op name tfc.scalarToTensor.                                                                                                                                     
 IRGen graph_op argument  with lowering 0                                                                                                                                                   
  Adding a single input.                                                                                                                                                                               
   Adding input of type $Builtin.Int32.
swift: /usr/local/google/home/jekbradbury/swift-sources/swift/lib/IRGen/IRGenSIL.cpp:2104: auto (anonymous namespace)::IRGenSILFunction::visitGraphOperationInst(swift::GraphOperationInst *)::(anonymous class)::operator()(swift::SILValue) const: Assertion `conformance && "input type does not conform to TensorArrayProtocol"' failed.

I'm less interested in having immediate fixes for all of these issues and more interested in figuring out some way to use .sum with AD—it feels like writing primitives for each of the functions I use should be enough, but in this case it seems like it isn't.

swift-ci commented 5 years ago

Comment by Mingsheng Hong (JIRA)

James asked me to take a look at this bug and see if we can unblock his work. So far I managed to create this simpler reproducer:

@differentiable(reverse, adjoint: adjointSum)
func sum(_ x: Tensor<Float>) -> Tensor<Float> {
    return Raw.identity(x)
}

func adjointSum(_ x: Tensor<Float>, originalValue: Tensor<Float>, seed: Tensor<Float>) -> Tensor<Float> {
  print("adjointSum: x=\(x)")
  return seed
  // return x would also crash
}

let res4 = #gradient({ x in sum(sum(x)) })(Tensor(Float(10)))
print(res4)

Crash info:

time ../build/$rdir/swift-linux-x86_64/bin/swiftc -O -Xllvm -tf-dynamic-compilation test/TensorFlow/tmp.swift -L../build/bazel-bin/tensorflow -ltensorflow -ltensorflow_framework && ./tmp

real    0m0.567s
user    0m0.384s
sys 0m0.142s

Compilation segmentation fault at Mon Nov 26 13:40:54