swiftlang / swift

The Swift Programming Language
https://swift.org
Apache License 2.0
67.28k stars 10.33k forks source link

[SR-15823] [AutoDiff] Runtime segfault from edge case #58097

Open philipturner opened 2 years ago

philipturner commented 2 years ago
Previous ID SR-15823
Radar None
Original Reporter @philipturner
Type Bug

Attachment: Download

Additional Detail from JIRA | | | |------------------|-----------------| |Votes | 0 | |Component/s | | |Labels | Bug | |Assignee | None | |Priority | Medium | md5: 7ac04b0e86fad47aa1f9a8580634f6ab

Issue Description:

The following code crashes at runtime under specific conditions as explained below. As-is, it crashes. To stop it from crashing, follow instructions in its comments.

import _Differentiation

// Remove the @noDerivative attribute here...
typealias MyType = @differentiable(reverse) (inout Float, @noDerivative Int) -> Void

@differentiable(reverse)
// ... and here to stop it from crashing.
func myFunc(_ x: inout Float, _ y: @noDerivative Int) -> Void {}

// Alternatively, un-comment out this line and it no longer crashes.
//print("hi")
print(myFunc as MyType)
philipturner commented 2 years ago

@asl your comment on SR-15818 (https://github.com/apple/swift/issues/58095#issuecomment-1108471934) was right on point! I spent around 4 days investigating the bug, and the cause is something very massive. This bug can actually crash the compiler. Take this reproducer, for example. It crashes during SILGen.

import _Differentiation

func myFunc(x: Float, y: Float) -> Float { x + y }

let item: @differentiable(reverse) (Float, Float) -> Float = myFunc

class Box<T> {
  var value: T
  init(value: T) {
    self.value = value
  }
}

let box = Box(value: item)
let retrieved = box.value

My full investigation is published as a gist, but here is the summary:

// There are multiple symptoms that point to the same problem:
// 1) Suffering from improper reference counting, basically unsafe bit casting
// 2) Type mismanagement in the compiler
// 3) No standardized or documented way to handle differentiable function
//    pointers in memory, at least in the way mentioned in the reproducer
// 
// The problem: there might be no built-in support for reading differentiable
// functions from device RAM.
//
// This is likely a much bigger problem than I can handle alone. I'm giving up
// for now and hoping I can defer it to people with more experience in the
// future.
fibrechannelscsi commented 1 year ago

The code in the 4/25 post is still causing recent toolchains (2023-01-02a, 2023-01-14a and 2023-01-18a) to crash.

We have:

Assertion failed: (isa<X>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 269.
...
1.  Apple Swift version 5.9-dev (LLVM 3f23b4ceaf01213, Swift 0763e4b98c74b5b)
2.  Compiling with the current language version
3.  While evaluating request ASTLoweringRequest(Lowering AST to SIL for module SomethingSmall)
4.  While emitting reabstraction thunk in SIL function "@$sS6fIegnrr_Iegnnro_S6fIegydd_Iegyydo_TR".
 for <<debugloc at "<compiler-generated>":0:0>>5.   While emitting reabstraction thunk in SIL function "@$sS3fIegnrr_S3fIegydd_TR".
asl commented 1 year ago

@fibrechannelscsi Looks like we've failed to create proper reabstraction thunk from @escaping @callee_guaranteed (@in_guaranteed Float) -> (@out Float, @out Float) to (Float) -> (Float, Float). Apparently the inner abstraction pattern for result is Opaque and outer result is obviously a tuple. Therefore the result planner thinks that inner type is not a tuple and just a scalar.

Hence an assertion when address is scalar is casted to address of tuple.

There is some special wrt Opaque in case of differential reabstraction thunks, but I do not recall it offhand. @rxwei Does this ring any bell?

asl commented 1 year ago

Ok, the problem is quite weird. Looks like we're confusing (@in_guaranteed Float) -> (@out Float, @out Float and (@in_guaranteed Float) -> @out (Float, Float) as a pullback type here.

The difference is caused by the context where we calculate the pullback type. In the first case we're inside generic function Box<T>::init so we're seeing tuple as a whole due to substitution, hence @out (Float, Float) (actually we're dealing with @substituted <τ_0_0, τ_0_1> (@in_guaranteed τ_0_0) -> @out τ_0_1 for <Float, (Float, Float) substituted type there). In the second case we're at let retrieved = box.value where the type is known exactly and we're "seeing" tuple as as separate arguments and therefore when we calculate pullback type we're ending with (@out Float, @out Float). Now, when we're creating a reabstraction thunk with Opaque pattern it really expects a single indirect result, but we're providing a tuple of such results.

We can extend thunk generation to handle such case, but I'm not 100% sure we will not end with some other corner cases. @BradLarson @rxwei @dan-zheng any suggestions?

asl commented 1 year ago

@philipturner Why are you closing it?

BradLarson commented 1 year ago

The reproducer from April 25, 2022, still triggers the above assertion failure as of the 2023-04-27 nightly snapshot, so this is still an issue.

jkshtj commented 4 months ago

Reproducer no longer crashes with the 05/24 toolchain.

asl commented 4 months ago

@jkshtj Have you checked that the type of reabstraction thunk is correct now?

jkshtj commented 4 months ago

@jkshtj Have you checked that the type of reabstraction thunk is correct now?

I haven't. I'm doing a quick run-through of the list of issues that we have internally.

If we need deeper investigation besides checking for no crashes, I'll keep this open.

asl commented 4 months ago

Thanks! I'd rather not close issues simply because "does not crash anymore" and instead try to check what changed, especially if some analysis was done before.