Closed swift-ci closed 5 years ago
Comment by Mingsheng Hong (JIRA)
James asked me to take a look at this bug and see if we can unblock his work. So far I managed to create this simpler reproducer:
@differentiable(reverse, adjoint: adjointSum)
func sum(_ x: Tensor<Float>) -> Tensor<Float> {
return Raw.identity(x)
}
func adjointSum(_ x: Tensor<Float>, originalValue: Tensor<Float>, seed: Tensor<Float>) -> Tensor<Float> {
print("adjointSum: x=\(x)")
return seed
// return x would also crash
}
let res4 = #gradient({ x in sum(sum(x)) })(Tensor(Float(10)))
print(res4)
Crash info:
time ../build/$rdir/swift-linux-x86_64/bin/swiftc -O -Xllvm -tf-dynamic-compilation test/TensorFlow/tmp.swift -L../build/bazel-bin/tensorflow -ltensorflow -ltensorflow_framework && ./tmp
real 0m0.567s
user 0m0.384s
sys 0m0.142s
Compilation segmentation fault at Mon Nov 26 13:40:54
Thanks for the reproducer. Investigating.
Comment by Mingsheng Hong (JIRA)
np, Richard. We've made some more progress, and I'm working on a fix. I hope to be able to send out a fix tonight, or otherwise hand it back to you. Does that sound reasonable?
Please also feel free to triage in parallel if you prefer though.
Is your investigation leading to an GPE/IRGen bug?
I have found the source of the crasher. It's because of a use-after-free in PrimalGen. I can take this bug from here and send out a simple fix.
Comment by Mingsheng Hong (JIRA)
SG. I haven't seen evidence of a GPE/IRGen issue, but can test again once your AD fix is landed.
Fixed segfault: https://github.com/apple/swift/pull/20788.
jekbradbury (JIRA User) I closed this bug because the immediate blocker is fixed (segfault). Now you should be able to make the tensor-returning `sum` differentiable. The issue with differentiating scalar-returning functions is currently tracked by another SR.
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 0 | |Component/s | Swift for TensorFlow | |Labels | Bug | |Assignee | @rxwei | |Priority | Medium | md5: 134b5cc8880169f5168a64c23d4a1b54Issue Description:
Say I want to differentiate a composite function that includes a sum over a tensor, e.g.:
The standard library doesn't have a primitive for sum, so let's add one:
Both with and without tf-dynamic-compilation, this gives:
This is a little strange, since I thought the problem was that AD doesn't work on generic functions yet, and that this should work since sum has a primitive adjoint. Before we try a concretization, let's go through a few variations (that I actually attempted first). If we use a user-defined function with custom adjoint rather than +, like this:
then there's a segfault in the adjoint (again for both dynamic and graph mode):
In particular, it looks like the runtime is trying to use an integer as a TensorHandle (see the 0x...20):
If we put the same code for sum and its adjoints in the standard library instead of user code, then adjointPlus gets the wrong inputs:
I haven't been able to reproduce this without involving standard library changes, and it also only occurs without tf-dynamic-compilation (turning that on gives the segfault from above).
If we instead try a concrete sum function:
then compiling with dynamic compilation gives the segfault and without gives
I'm less interested in having immediate fixes for all of these issues and more interested in figuring out some way to use .sum with AD—it feels like writing primitives for each of the functions I use should be enough, but in this case it seems like it isn't.