Closed BradLarson closed 2 years ago
I have a reproducer: https://github.com/apple/swift/pull/41437
@BradLarson @philipturnerWill you please check, if this issue still reproduce with ToT?
The reason why it failed in [#41437| https://github.com/apple/swift/pull/41437] was the following bogus debug info:
%12 = alloc_stack $_Tensor<τ_0_0>, let, (name "self", loc "/Users/philipturner/Documents/building-tensorflow/swift-for-tensorflow/Sources/TensorFlow/Layers/Normalization.swift":70:8, scope 0), argno 2, implicit, type $*BatchNorm<τ_0_0>.TangentVector, expr op_fragment:#BatchNorm.TangentVector.offset // users: %84, %57, %70, %85, %72, %59, %14, %90
It seems that something changed and I do not see this anymore:
%5 = apply %4<τ_0_0>(%3) : $@convention(method) <τ_0_0> (@thin BatchNorm<τ_0_0>.TangentVector.Type) -> @owned BatchNorm<τ_0_0>.TangentVector // users: %51, %69, %33, %47, %28, %78
%6 = alloc_stack $Tensor<τ_0_0>, var, name "offset" // users: %23, %45, %36, %29, %10, %92
%7 = metatype $@thin Tensor<τ_0_0>.Type // users: %90, %83, %38, %31, %17, %14, %11, %9
// function_ref static Tensor.zero.getter
%8 = function_ref @$s4main6TensorV4zeroACyxGvgZ : $@convention(method) <τ_0_0> (@thin Tensor<τ_0_0>.Type) -> @owned Tensor<τ_0_0> // users: %90, %83, %38, %31, %17, %14, %11, %9
%9 = apply %8<τ_0_0>(%7) : $@convention(method) <τ_0_0> (@thin Tensor<τ_0_0>.Type) -> @owned Tensor<τ_0_0> // user: %10
store %9 to %6 : $*Tensor<τ_0_0> // id: %10
%11 = apply %8<τ_0_0>(%7) : $@convention(method) <τ_0_0> (@thin Tensor<τ_0_0>.Type) -> @owned Tensor<τ_0_0> // users: %73, %62, %12
debug_value %11 : $Tensor<τ_0_0>, var, name "offset" // id: %12
%13 = apply %4<τ_0_0>(%3) : $@convention(method) <τ_0_0> (@thin BatchNorm<τ_0_0>.TangentVector.Type) -> @owned BatchNorm<τ_0_0>.TangentVector // users: %72, %61
%14 = apply %8<τ_0_0>(%7) : $@convention(method) <τ_0_0> (@thin Tensor<τ_0_0>.Type) -> @owned Tensor<τ_0_0> // user: %44
%15 = apply %4<τ_0_0>(%3) : $@convention(method) <τ_0_0> (@thin BatchNorm<τ_0_0>.TangentVector.Type) -> @owned BatchNorm<τ_0_0>.TangentVector // user: %43
%16 = alloc_stack $Tensor<τ_0_0>, var, name "offset" // users: %60, %25, %46, %37, %30, %18, %53, %91
%17 = apply %8<τ_0_0>(%7) : $@convention(method) <τ_0_0> (@thin Tensor<τ_0_0>.Type) -> @owned Tensor<τ_0_0> // user: %18
store %17 to %16 : $*Tensor<τ_0_0> // id: %18
%19 = apply %4<τ_0_0>(%3) : $@convention(method) <τ_0_0> (@thin BatchNorm<τ_0_0>.TangentVector.Type) -> @owned BatchNorm<τ_0_0>.TangentVector // user: %24
%20 = struct_extract %2 : $_AD__$s4main9BatchNormV14callAsFunctionyAA6TensorVyxGAGF_bb3__PB__src_0_wrt_0_1_l<τ_0_0>, #_AD__$s4main9BatchNormV14callAsFunctionyAA6TensorVyxGAGF_bb3__PB__src_0_wrt_0_1_l.predecesso
debug_value %1 : $Tensor<τ_0_0> // id: %21
// function_ref specialized static Tensor.+= infix(_:_:)
%22 = function_ref @$s4main6TensorV2peoiyyACyxGz_AEtFZTf4ndd_n : $@convention(thin) <τ_0_0> (@inout Tensor<τ_0_0>) -> () // users: %88, %60, %23
%23 = apply %22<τ_0_0>(%6) : $@convention(thin) <τ_0_0> (@inout Tensor<τ_0_0>) -> ()
release_value %19 : $BatchNorm<τ_0_0>.TangentVector // id: %24
%25 = struct_element_addr %16 : $*Tensor<τ_0_0>, #Tensor.handle // users: %64, %70, %39, %32, %26
%26 = load %25 : $*TensorHandle // user: %27
strong_release %26 : $TensorHandle // id: %27
debug_value %5 : $BatchNorm<τ_0_0>.TangentVector, let, name "self", argno 2, implicit, expr op_deref // id: %28
I have confirmed that the crash disappeared between the February 3 and February 22 toolchain. For reference, I'm using a command like this to test:
swift % swiftc -O -g -debug-info-format=dwarf ../helloworld/file.swift
I reopened #41437.
@asl - A top-of-tree build for Ubuntu 20.04 aarch64 as of March 2 no longer experiences this, nor do the February 25 nightly snapshots on macOS or otherwise. That's great news, but I am concerned about it regressing without a better understanding of the cause.
I was bisecting to locate the fix of the crash, but something changed recently that prevents me from testing very old commits with my current setup. I tested as far back as 92d014c4018c6ab1502eac0888e5bfddf19d6a62 before a CMake error caused the toolchain build to fail. This was the very first commit on February 12. If anyone has the capacity to finish my bisection, the range of interest is between February 3 and February 11, 2022.
I compiled a list of pull requests within that time frame, which could have solved the crash or modified relevant code.
https://github.com/apple/swift/pull/41201 - modification to lib/SILOptimizer/Differentiation/Thunk.cpp
https://github.com/apple/swift/pull/41184 - several modifications in lib/IRGen
https://github.com/apple/swift/pull/40853 - change to lib/IRGen/GenReflection.cpp
https://github.com/apple/swift/pull/41148 - change to lib/IRGen/GenMeta.cpp
https://github.com/apple/swift/pull/41287 - modifies Serialization directory
https://github.com/apple/swift/pull/41349 - modifies Serialization directory
If I had to hazard a guess for merged pull requests within those dates, I'd lean toward https://github.com/apple/swift/pull/41294 . The new logic for avoiding insertion of the same debug_value twice seems like a decent candidate for preventing the problematic debug info from being inserted.
@philipturner - Does your lit test in PR #41437 fail as-is with an older toolchain? I've only had this fail locally with optimized builds, and unless I'm reading something wrong that test isn't being built with optimization on. If your invocation there does trigger the crasher on older toolchains, I think it would be useful to add that as a regression test to prevent a recurrence.
The test only fails on macOS when you have full optimization, release mode, and manually select the DWARF debug symbol format. Any other configuration, and it will not crash. I'll modify the test in the PR to reflect these requirements for reproducing.
#41294 seems like a perfect explanation of the fix. I only searched files and directories related to AutoDiff and IRGen, so I did not consider #41294.
The underlying issue is a bug in pullback cloner. I'm testing a fix
Here is another reproducer (crashing for release, no crash in debug), swift-DEVELOPMENT-SNAPSHOT-2022-03-31-a snapshot.
import _Differentiation
public struct Integration: Differentiable {
var precessionState: PrecessionState = PrecessionState()
@noDerivative
var startMyr: Double = -1.0
var var0: DoubleVectorReal2 = .init(x: 0.0, y: .init(x0: 0.0, x1: 0.0))
var var1: DoubleVectorReal2 = .init(x: 0.0, y: .init(x0: 0.0, x1: 0.0))
}
extension Integration {
public struct PrecessionState: Differentiable {
public var output: [Double] = [Double]()
public init() { self.output = [Double]() }
}
}
extension Integration {
public struct VectorReal2: Differentiable {
public var x0: Double
public var x1: Double
}
public struct DoubleVectorReal2: Differentiable {
public var x: Double
public var y: VectorReal2
@differentiable(reverse)
public init(x: Double, y: VectorReal2){ self.x = x; self.y = y }
}
}
extension Integration {
@differentiable(reverse)
public mutating func integrate() {
self.precessionState = PrecessionState()
let _: Double = (startMyr < 0 ? -1.0 : 1.0) //required to crash
}
}
var i = Integration()
i.integrate()
produces:
SIL verification failed: conflicting debug variable type!: DebugVars[argNum].second == DebugVarTy
Verifying instruction:
%19 = load %0 : $*Integration.TangentVector // users: %21, %20, %22
-> debug_value %19 : $Integration.TangentVector, var, name "self", argno 1, implicit, expr op_deref // id: %20
In function:
// specialized pullback of Integration.integrate()
sil private @$s4main11IntegrationV9integrateyyFTJpSpSrTf4nd_n : $@convention(thin) (@inout Integration.TangentVector) -> () {
// %0 // users: %35, %19
bb0(%0 : $*Integration.TangentVector):
%1 = alloc_stack $Integration.TangentVector, var, name "self", argno 1, implicit, expr op_deref // users: %33, %22, %16, %12, %36, %23
%2 = metatype $@thin Array<Double>.DifferentiableView.Type // users: %30, %14, %13, %4
// function_ref static Array<A>.DifferentiableView<>.zero.getter
%3 = function_ref @$sSa16_DifferentiationAA14DifferentiableRzlE0B4ViewVAAs18AdditiveArithmeticRzrlE4zeroADyx_GvgZ : $@convention(method) <τ_0_0 where τ_0_0 : AdditiveArithmetic, τ_0_0 : Differentiable> (@thin Array<τ_0_0>.DifferentiableView.Type) -> @owned Array<τ_0_0>.DifferentiableView // users: %30, %14, %13, %4
%4 = apply %3<Double>(%2) : $@convention(method) <τ_0_0 where τ_0_0 : AdditiveArithmetic, τ_0_0 : Differentiable> (@thin Array<τ_0_0>.DifferentiableView.Type) -> @owned Array<τ_0_0>.DifferentiableView // user: %5
%5 = struct $Integration.PrecessionState.TangentVector (%4 : $Array<Double>.DifferentiableView) // user: %11
%6 = integer_literal $Builtin.Int64, 0 // user: %7
%7 = builtin "sitofp_Int64_FPIEEE64"(%6 : $Builtin.Int64) : $Builtin.FPIEEE64 // user: %8
%8 = struct $Double (%7 : $Builtin.FPIEEE64) // users: %10, %9, %9
%9 = struct $Integration.VectorReal2.TangentVector (%8 : $Double, %8 : $Double) // user: %10
%10 = struct $Integration.DoubleVectorReal2.TangentVector (%8 : $Double, %9 : $Integration.VectorReal2.TangentVector) // users: %11, %11
%11 = struct $Integration.TangentVector (%5 : $Integration.PrecessionState.TangentVector, %10 : $Integration.DoubleVectorReal2.TangentVector, %10 : $Integration.DoubleVectorReal2.TangentVector) // user: %12
store %11 to %1 : $*Integration.TangentVector // id: %12
%13 = apply %3<Double>(%2) : $@convention(method) <τ_0_0 where τ_0_0 : AdditiveArithmetic, τ_0_0 : Differentiable> (@thin Array<τ_0_0>.DifferentiableView.Type) -> @owned Array<τ_0_0>.DifferentiableView // user: %15
%14 = apply %3<Double>(%2) : $@convention(method) <τ_0_0 where τ_0_0 : AdditiveArithmetic, τ_0_0 : Differentiable> (@thin Array<τ_0_0>.DifferentiableView.Type) -> @owned Array<τ_0_0>.DifferentiableView // user: %18
release_value %13 : $Array<Double>.DifferentiableView // id: %15
%16 = load %1 : $*Integration.TangentVector // user: %17
release_value %16 : $Integration.TangentVector // id: %17
release_value %14 : $Array<Double>.DifferentiableView // id: %18
%19 = load %0 : $*Integration.TangentVector // users: %21, %20, %22
debug_value %19 : $Integration.TangentVector, var, name "self", argno 1, implicit, expr op_deref // id: %20
debug_value %19 : $Integration.TangentVector, var, name "self", argno 1, implicit, expr op_deref // id: %21
store %19 to %1 : $*Integration.TangentVector // id: %22
%23 = struct_element_addr %1 : $*Integration.TangentVector, #Integration.TangentVector.precessionState // users: %24, %32
%24 = struct_element_addr %23 : $*Integration.PrecessionState.TangentVector, #Integration.PrecessionState.TangentVector.output // user: %25
%25 = struct_element_addr %24 : $*Array<Double>.DifferentiableView, #Array.DifferentiableView._base // user: %26
%26 = struct_element_addr %25 : $*Array<Double>, #Array._buffer // user: %27
%27 = struct_element_addr %26 : $*_ArrayBuffer<Double>, #_ArrayBuffer._storage // user: %28
%28 = struct_element_addr %27 : $*_BridgeStorage<__ContiguousArrayStorageBase>, #_BridgeStorage.rawValue // user: %29
%29 = load %28 : $*Builtin.BridgeObject // user: %34
%30 = apply %3<Double>(%2) : $@convention(method) <τ_0_0 where τ_0_0 : AdditiveArithmetic, τ_0_0 : Differentiable> (@thin Array<τ_0_0>.DifferentiableView.Type) -> @owned Array<τ_0_0>.DifferentiableView // user: %31
%31 = struct $Integration.PrecessionState.TangentVector (%30 : $Array<Double>.DifferentiableView) // user: %32
store %31 to %23 : $*Integration.PrecessionState.TangentVector // id: %32
%33 = load %1 : $*Integration.TangentVector // user: %35
strong_release %29 : $Builtin.BridgeObject // id: %34
store %33 to %0 : $*Integration.TangentVector // id: %35
dealloc_stack %1 : $*Integration.TangentVector // id: %36
%37 = tuple () // user: %38
return %37 : $() // id: %38
} // end sil function '$s4main11IntegrationV9integrateyyFTJpSpSrTf4nd_n'
Please submit a bug report (https://swift.org/contributing/#reporting-bugs) and include the project and the crash backtrace.
Stack dump:
0. Program arguments: /Users/developer/swift-source/build/Ninja-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-arm64/bin/swift-frontend -frontend -c -primary-file main.swift -target arm64-apple-macosx12.0 -Xllvm -aarch64-use-tbi -enable-objc-interop -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -color-diagnostics -O -new-driver-path /Users/developer/Developer_Disk/nonApple/swift-source/build/Ninja-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-arm64/bin/swift-driver -empty-abi-descriptor -resource-dir /Users/wiggles/Developer_Disk/nonApple/swift-source/build/Ninja-RelWithDebInfoAssert+swift-DebugAssert/swift-macosx-arm64/lib/swift -module-name main -target-sdk-version 12.3 -o /var/folders/hq/bm2zh17n5gq1xbhtvm30rj7r0000gn/T/TemporaryDirectory.S50QA0/main-1.o
1. Swift version 5.7-dev (LLVM 6e60a3dd7d28494, Swift f77759b87e8ecaf)
2. Compiling with the current language version
3. While verifying SIL function "@$s4main11IntegrationV9integrateyyFTJpSpSrTf4nd_n".
for 'integrate()' (at main.swift:35:19)
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0 swift-frontend 0x000000010a24a5b4 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 56
1 swift-frontend 0x000000010a249784 llvm::sys::RunSignalHandlers() + 112
2 swift-frontend 0x000000010a24ac20 SignalHandler(int) + 304
3 libsystem_platform.dylib 0x00000001ad1154c4 _sigtramp + 56
4 libsystem_pthread.dylib 0x00000001ad0fdee0 pthread_kill + 288
5 libsystem_c.dylib 0x00000001ad038340 abort + 168
6 swift-frontend 0x00000001033afad4 (anonymous namespace)::SILVerifier::_require(bool, llvm::Twine const&, std::__1::function<void ()> const&) + 764
7 swift-frontend 0x00000001033c8560 (anonymous namespace)::SILVerifier::checkDebugVariable(swift::SILInstruction*) + 1212
8 swift-frontend 0x00000001033c560c (anonymous namespace)::SILVerifier::checkInstructionsDebugInfo(swift::SILInstruction*) + 344
9 swift-frontend 0x00000001033c4648 (anonymous namespace)::SILVerifier::visitSILInstruction(swift::SILInstruction*) + 56
10 swift-frontend 0x00000001033c3f9c (anonymous namespace)::SILVerifierBase<(anonymous namespace)::SILVerifier>::visitDebugValueInst(swift::DebugValueInst*) + 36
11 swift-frontend 0x00000001033be84c swift::SILInstructionVisitor<(anonymous namespace)::SILVerifier, void>::visit(swift::SILInstruction*) + 3432
12 swift-frontend 0x00000001033bd3b8 swift::SILVisitorBase<(anonymous namespace)::SILVerifier, void>::visitSILBasicBlock(swift::SILBasicBlock*) + 128
13 swift-frontend 0x00000001033bb3cc (anonymous namespace)::SILVerifier::visitSILBasicBlock(swift::SILBasicBlock*) + 492
14 swift-frontend 0x00000001033b3bec (anonymous namespace)::SILVerifier::visitSILBasicBlocks(swift::SILFunction*) + 160
15 swift-frontend 0x00000001033b22bc (anonymous namespace)::SILVerifier::visitSILFunction(swift::SILFunction*) + 1688
16 swift-frontend 0x00000001033ab6d8 (anonymous namespace)::SILVerifier::verify() + 28
17 swift-frontend 0x00000001033ab5b0 swift::SILFunction::verify(bool) const + 104
18 swift-frontend 0x00000001033ae53c swift::SILModule::verify() const + 320
19 swift-frontend 0x0000000102fa67e0 swift::CompilerInstance::performSILProcessing(swift::SILModule*) + 320
20 swift-frontend 0x0000000102ea5298 performCompileStepsPostSILGen(swift::CompilerInstance&, std::__1::unique_ptr<swift::SILModule, std::__1::default_delete<swift::SILModule> >, llvm::PointerUnion<swift::ModuleDecl*, swift::SourceFile*>, swift::PrimarySpecificPaths const&, int&, swift::FrontendObserver*) + 668
21 swift-frontend 0x0000000102ea4c40 swift::performCompileStepsPostSema(swift::CompilerInstance&, int&, swift::FrontendObserver*) + 584
22 swift-frontend 0x0000000102ed2644 performAction(swift::CompilerInstance&, int&, swift::FrontendObserver*)::$_21::operator()(swift::CompilerInstance&) const + 140
23 swift-frontend 0x0000000102ed25a8 bool llvm::function_ref<bool (swift::CompilerInstance&)>::callback_fn<performAction(swift::CompilerInstance&, int&, swift::FrontendObserver*)::$_21>(long, swift::CompilerInstance&) + 48
24 swift-frontend 0x0000000102ecac80 llvm::function_ref<bool (swift::CompilerInstance&)>::operator()(swift::CompilerInstance&) const + 64
25 swift-frontend 0x0000000102ec9c30 withSemanticAnalysis(swift::CompilerInstance&, swift::FrontendObserver*, llvm::function_ref<bool (swift::CompilerInstance&)>, bool) + 428
26 swift-frontend 0x0000000102ec507c performAction(swift::CompilerInstance&, int&, swift::FrontendObserver*) + 1148
27 swift-frontend 0x0000000102ea70d8 performCompile(swift::CompilerInstance&, int&, swift::FrontendObserver*) + 220
28 swift-frontend 0x0000000102ea63ec swift::performFrontend(llvm::ArrayRef<char const*>, char const*, void*, swift::FrontendObserver*) + 2152
29 swift-frontend 0x0000000102c67bb0 run_driver(llvm::StringRef, llvm::ArrayRef<char const*>, llvm::ArrayRef<char const*>) + 320
30 swift-frontend 0x0000000102c66fbc swift::mainEntry(int, char const**) + 1096
31 swift-frontend 0x0000000102c66948 main + 36
32 dyld 0x000000012503d088 start + 516
This has been resolved by PR #42245.
Unfortunately the PR was reverted here: https://github.com/apple/swift/pull/42561, so the issue is still open.
@astrotuna201 That revert never landed (the issue it was trying to address was elsewhere), so the fix is still in place. It should be present in nightly toolchain snapshots since 2022-04-23, and has been working well in all of our test cases. Are you still experiencing this with the latest nightly snapshots?
Ah sorry for the noise. Yes, I cherry picked it locally, and it is working.
On 29. Apr 2022, at 04:22, Brad Larson @.***> wrote:
@astrotuna201 That revert never landed (the issue it was trying to address was elsewhere), so the fix is still in place. It should be present in nightly toolchain snapshots since 2022-04-23, and has been working well in all of our test cases. Are you still experiencing this with the latest nightly snapshots?
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.
Attachment: Download
Additional Detail from JIRA
| | | |------------------|-----------------| |Votes | 2 | |Component/s | Compiler | |Labels | Bug, AutoDiff | |Assignee | None | |Priority | Medium | md5: 82c8ff1d6e5a28a6187d0ca10fe66599Issue Description:
Certain differentiable mutating functions on a struct that contain control flow within their bodies can lead to an assertion failure of `SIL verification failed: conflicting debug variable type!: DebugVars[argNum].second == DebugVarTy`. This appears to only happen in optimized builds where debug symbols are enabled.
It has been difficult to isolate a reproducer for this. The closest public reproducing case is in this pull request when building the BatchNorm struct. The representative failure output seen there is in the attached file.
This regression appeared some time in September 2021, and I need to isolate this further. I also need to minimize a better reproducer case. I'm creating this issue to provide a point of coordination as we work to narrow this down.