swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.56k stars 10.35k forks source link

[SR-9345] [AD]ventures in differentiating Tensor.sum (IRGen also involved) #51815

Closed swift-ci closed 5 years ago

swift-ci commented 5 years ago
Previous ID SR-9345
Radar None
Original Reporter jekbradbury (JIRA User)
Type Bug
Status Resolved
Resolution Done
Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | Swift for TensorFlow | |Labels | Bug | |Assignee | @rxwei | |Priority | Medium | md5: 134b5cc8880169f5168a64c23d4a1b54

Issue Description:

Say I want to differentiate a composite function that includes a sum over a tensor, e.g.:

import TensorFlow
#gradient({ x in x.sum(squeezingAxes: 0) + Tensor(Float(2)) })(Tensor([Float(10)]))

The standard library doesn't have a primitive for sum, so let's add one:

public extension Tensor where Scalar: Numeric {
  @inlinable @inline(__always)
  @differentiable(reverse, wrt: (self), adjoint: _adjointSumToScalar)
  func sum() -> Scalar {
    let axes = Tensor<Int32>(rangeFrom: 0, to: rank, stride: 1)
    return Scalar._getScalarOrDie(Raw.sum(self, reductionIndices: axes).handle)
  }
  @inlinable @inline(__always)
  @differentiable(reverse, wrt: (self), adjoint: _adjointSumSqueezingAxes)
  func sum(squeezingAxes axes: Int32...) -> Tensor {
    return Raw.sum(self, reductionIndices: Tensor<Int32>(axes), keepDims: false)
  }
}
extension Tensor where Scalar: Numeric {
  @inlinable
  func _adjointSumToScalar(originalValue: Scalar, seed: Scalar) -> Tensor {
    let seedAsTensor = Tensor<Scalar>(seed)
    let ret = seedAsTensor.broadcast(like: self)
    return ret
  }
  @inlinable
  func _adjointSumSqueezingAxes(axes: Int32..., originalValue: Tensor, seed: Tensor) -> Tensor {
    let ret = seed.broadcast(like: self)
    return ret
  }
}

Both with and without tf-dynamic-compilation, this gives:

<unknown>:0: note: differentiating generic functions is not supported yet
<unknown>:0: error: function is not differentiable

This is a little strange, since I thought the problem was that AD doesn't work on generic functions yet, and that this should work since sum has a primitive adjoint. Before we try a concretization, let's go through a few variations (that I actually attempted first). If we use a user-defined function with custom adjoint rather than +, like this:

@differentiable(reverse, adjoint: adjointPlus)
func plus(_ x: Tensor<Float>, _ y: Tensor<Float>) -> Tensor<Float>{
    print("plus: x=\(x)")
    print("plus: y=\(y)")
    return x + y
}
func adjointPlus(
_ x: Tensor<Float>, _ y: Tensor<Float>, originalValue: Tensor<Float>, seed: Tensor<Float>
) -> (Tensor<Float>, Tensor<Float>) {
    print("adjointPlus: x=\(x)")
    print("adjointPlus: y=\(y)")
    return (seed, seed)
}

#gradient({ x in plus(x.sum(squeezingAxes: 0), Tensor(Float(2))) })(Tensor([Float(10)]))

then there's a segfault in the adjoint (again for both dynamic and graph mode):

plus: x=10.0
plus: y=2.0
Segmentation fault

In particular, it looks like the runtime is trying to use an integer as a TensorHandle (see the 0x...20):

frame #&#8203;1: 0x00007ffff7ed6539 libswiftTensorFlow.so`ShapedArray<A>.init(cTensorHandle=0x0000000000000020) at TensorHandle.swift:204:38                                                                   
   201      // NOTE: This will not perform a copy if the handle is already on the host.
   202      let context = _ExecutionContext.global                                                                                                    
   203      debugLog("Calling TFE_TensorHandleCopyToDevice().")                                                                                       
-> 204      let hostHandle: CTensorHandle! = TFE_TensorHandleCopyToDevice(                                                                               
   205        cTensorHandle, context.eagerContext, context.cpuDeviceName, status)
   206      checkOk(status)                                                                                                                                                                         
   207      internalConsistencyCheck(hostHandle != nil,   

If we put the same code for sum and its adjoints in the standard library instead of user code, then adjointPlus gets the wrong inputs:

plus: x=10.0
plus: y=2.0
adjointPlus: x=0.0
adjointPlus: y=0.0

I haven't been able to reproduce this without involving standard library changes, and it also only occurs without tf-dynamic-compilation (turning that on gives the segfault from above).

If we instead try a concrete sum function:

@differentiable(reverse, adjoint: adjointSum)
func sum(_ x: Tensor<Float>) -> Tensor<Float> {
    let axes = Tensor<Int32>(rangeFrom: 0, to: x.rank, stride: 1)
    return Raw.sum(x, reductionIndices: axes, keepDims: false)
}
func adjointSum(_ x: Tensor<Float>, originalValue: Tensor<Float>, seed: Tensor<Float>) -> Tensor<Float> {
    return seed.broadcast(like: x)
}

#gradient({ x in plus(sum(x), Tensor(Float(2))) })(Tensor([Float(10)]))

then compiling with dynamic compilation gives the segfault and without gives

...
Calling TFE_NewOp(ctx, op, status) with op name tfc.scalarToTensor.                                                                                                                                     
 IRGen graph_op argument  with lowering 0                                                                                                                                                   
  Adding a single input.                                                                                                                                                                               
   Adding input of type $Builtin.Int32.
swift: /usr/local/google/home/jekbradbury/swift-sources/swift/lib/IRGen/IRGenSIL.cpp:2104: auto (anonymous namespace)::IRGenSILFunction::visitGraphOperationInst(swift::GraphOperationInst *)::(anonymous class)::operator()(swift::SILValue) const: Assertion `conformance && "input type does not conform to TensorArrayProtocol"' failed.

I'm less interested in having immediate fixes for all of these issues and more interested in figuring out some way to use .sum with AD—it feels like writing primitives for each of the functions I use should be enough, but in this case it seems like it isn't.

swift-ci commented 5 years ago

Comment by Mingsheng Hong (JIRA)

James asked me to take a look at this bug and see if we can unblock his work. So far I managed to create this simpler reproducer:

@differentiable(reverse, adjoint: adjointSum)
func sum(_ x: Tensor<Float>) -> Tensor<Float> {
    return Raw.identity(x)
}

func adjointSum(_ x: Tensor<Float>, originalValue: Tensor<Float>, seed: Tensor<Float>) -> Tensor<Float> {
  print("adjointSum: x=\(x)")
  return seed
  // return x would also crash
}

let res4 = #gradient({ x in sum(sum(x)) })(Tensor(Float(10)))
print(res4)

Crash info:

time ../build/$rdir/swift-linux-x86_64/bin/swiftc -O -Xllvm -tf-dynamic-compilation test/TensorFlow/tmp.swift -L../build/bazel-bin/tensorflow -ltensorflow -ltensorflow_framework && ./tmp

real    0m0.567s
user    0m0.384s
sys 0m0.142s

Compilation segmentation fault at Mon Nov 26 13:40:54
rxwei commented 5 years ago

Thanks for the reproducer. Investigating.

swift-ci commented 5 years ago

Comment by Mingsheng Hong (JIRA)

np, Richard. We've made some more progress, and I'm working on a fix. I hope to be able to send out a fix tonight, or otherwise hand it back to you. Does that sound reasonable?

Please also feel free to triage in parallel if you prefer though.

rxwei commented 5 years ago

Is your investigation leading to an GPE/IRGen bug?

rxwei commented 5 years ago

I have found the source of the crasher. It's because of a use-after-free in PrimalGen. I can take this bug from here and send out a simple fix.

swift-ci commented 5 years ago

Comment by Mingsheng Hong (JIRA)

SG. I haven't seen evidence of a GPE/IRGen issue, but can test again once your AD fix is landed.

rxwei commented 5 years ago

Fixed segfault: https://github.com/apple/swift/pull/20788.

rxwei commented 5 years ago

jekbradbury (JIRA User) I closed this bug because the immediate blocker is fixed (segfault). Now you should be able to make the tensor-returning `sum` differentiable. The issue with differentiating scalar-returning functions is currently tracked by another SR.