Closed philipturner closed 2 years ago
CC: @asl hopefully #58965 resolves this bug.
Seems to be unrelated to https://github.com/apple/swift/pull/58965 (and certainly not similar to #55170) as assertion is in debug info generation. Likely compilation w/o debug info will allow to workaround an issue
@asl - While it points to debug info, we had a very similar "Failed to reconstruct type" error appear in a local module that did seem to go away after #58965 was applied. This is a much simpler reproducer than our case, but the trace looks close.
I found the reproducer while testing an old branch that was usable only months ago. It was pretty much the same as Fan's branch, which was only usable with the November 12, 2021 toolchain. But I just tested top-of-tree S4TF at s4tf/s4tf, and the crash appears there as well. I'm surprised it slipped under my nose for almost 30 days.
I had been testing only the 5.7 branch snapshots since exactly May 18. I had both the trunk toolchain and 5.7 branch toolchain for May 18 on my computer, and thought the trunk toolchain was running. Apparently it was the 5.7 branch toolchain.
I am unable to reproduce the crash on top-of-tree as a Stdlib regression test. I tried replication the style of @asl's test/AutoDiff/Sema/DerivativeRegistrationCrossFile
, but no luck. I also tried running the test on the #58965 branch, and nothing changed. To add this bug as a regression test for #58965, we need to investigate it further.
The base Swift repository doesn't include swift-package
, so we can't build via the Swift package manager. I have not succeeded in reproducing the crash on the 2022-06-18 toolchain using the swiftc
command; rather, I have to use SwiftPM. That means we must first discover how the SwiftPM build translates into one massive swiftc
call. Once that is discovered, we can add this as a test case.
Here is the progress I made. Add this file to test/AutoDiff/compiler_crashers_fixed/Inputs
as 59467-???.swift
:
import _Differentiation
struct Tensor: Differentiable {}
// `Tensor` could be defined in this test case's primary file and the crash
// would still happen. All that matters is that `LayerNorm_callAsFunction` and
// `rsqrt` are defined in separate files.
@differentiable(reverse)
func rsqrt( _ x: Tensor) -> Tensor {
fatalError()
}
@derivative(of: rsqrt)
func _vjpRsqrt(_ x: Tensor) -> (
value: Tensor, pullback: (Tensor.TangentVector) -> (Tensor.TangentVector)
) {
fatalError()
}
And add the following as 59467-???.swift
. You can choose the file name that seems most appropriate, replacing "???" with the chosen name.
// RUN: %target-swift-frontend -emit-ir -primary-file %s %S/Inputs/59467-???.swift -module-name main -o /dev/null
import _Differentiation
@_semantics("autodiff.nonvarying")
func withoutDerivative() -> Tensor {
fatalError()
}
func BatchNorm_doInference(
_ input: Tensor
) -> Tensor {
withoutDerivative()
}
@differentiable(reverse)
func BatchNorm_callAsFunction(_ input: Tensor) -> Tensor {
BatchNorm_doInference(input)
}
@differentiable(reverse)
func LayerNorm_callAsFunction(_ input: Tensor) -> Tensor {
rsqrt(input)
}
Would you be able to finish what I started here?
We should not add this as https://github.com/apple/swift/pull/58965 regression test. As the issue is completely different. It might be hidden by https://github.com/apple/swift/pull/58965 actually.
I also tried running the test on the https://github.com/apple/swift/pull/58965 branch, and nothing changed.
What do you mean? The issue was resolved? Or not resolved?
The compiler failed to crash, just like it failed to crash on the main
branch. We don't yet know whether #58965 fixed, hid, or did nothing to this bug. We don't yet have a way to test #58965 against the bug. That's what I was getting at.
Can't you simply build a toolchain from branch and build your code with it?
The closest I have to building a toolchain from scratch with my current skillset is:
swiftc
binary and call that binary from the command line.I don't know how to compile a full-on toolchain that includes SwiftPM and everything added on to the bare apple/swift repo. I was hoping that you could do so more easily than I could.
Hopefully this: https://github.com/apple/swift-package-manager/blob/main/CONTRIBUTING.md will help. Check Advanced
section.
I'm still struggling to get that working. Overall, I think it's more time-effective if you test this bug with SwiftPM. Plus, it was PassiveLogic that originally encountered the bug (https://github.com/apple/swift/issues/59467#issuecomment-1156974655) without doing the necessary work to fully narrow down and report it. I don't mean to be disrespectful, but I have been narrowing this bug for ~12 hours straight and I think it logically makes sense that you and Brad share some of the responsibility of investigating this bug.
I'll let @BradLarson and PassiveLogic folks take care about test reduction for you.
I don't have a standalone lit test, but I can say that a SwiftPM package with the original configuration described above does reproduce this error on multiple platforms with toolchains since the original derivative registration fix went in. A toolchain with #58965 applied builds this package without error. That may indeed just be a suppression of the underlying problem, but the error does not reproduce after that patch.
The swiftc invocation that triggers this for that package goes something like the following:
/usr/bin/swiftc -module-name TensorFlow -incremental -emit-dependencies -emit-module -emit-module-path /root/ReconstructedType/.build/aarch64-unknown-linux-gnu/debug/TensorFlow.swiftmodule -parse-as-library -c /root/ReconstructedType/Sources/TensorFlow/Core/Layers/Normalization.swift /root/ReconstructedType/Sources/TensorFlow/Core/Tensor.swift -target aarch64-unknown-linux-gnu -Onone -enable-testing -g -j8 -DSWIFT_PACKAGE -DDEBUG -parse-as-library
When we saw this, it was highly dependent on specific locations and names of functions. A workaround was to simply rename or move the file that was triggering the error, and the rest would build fine. I didn't pursue it further when I saw that #58965 prevented it in our local cases.
Thanks! I will be fine if #58965 is destined to be merged before the release of Swift 5.8. This means a deadline of March 2023 - months into the future. Could we add this bug into test/AutoDiff/compiler-crashers-fixed
so we can track when it re-appears ("regresses") in the future? If it reappears, we will then move it into test/AutoDiff/compiler-crashers
. That is the purpose of the test/AutoDiff/compiler-crashers
directory - for crashers that were once suppressed but never really fixed.
I still need to test your workaround of renaming files and relocating code. Regardless, the bug impacts my project and I would not be happy if it silently reappeared. Adding this crash to regression tests as part of #58965 would instantly notify us if it reappears.
I reproduced the crash without SwiftPM! I should be able to carry on from here and author the regression test for #58965.
Build command narrowed down to:
swiftc ../main.swift ../Inputs/Tensor.swift -g
Note: in the regression tests, main.swift
must be specified as the primary file. Use -primary-file %s
.
I just tested S4TF against the latest trunk development snapshot, and it works! July 20, 2022 Swift.org trunk development snapshot, Google Colaboratory (Ubuntu 18.04, x86_64). All of the S4TF tutorials made by the TensorFlow organization work as well.
Describe the bug This bug appeared between the May 11, 2022 and May 18, 2022 Trunk Development Snapshots. It seems to be caused by changing behavior of cross-file lookup for derivatives, resulting from the merging of #58644. It seems similar to #55170. The suspected source of this bug was not merged into the
release/5.7
branch, so it is not a great concern for now. I will still be able to compile on the Swift 5.7 release toolchain, which I am currently targeting.To Reproduce Steps to reproduce the behavior:
Package.swift
:Sources/TensorFlow/Core/Tensor.swift
:Sources/TensorFlow/Core/Layers/Normalization.swift
Crash stack trace
``` Failed to reconstruct type for $s10TensorFlow09_AD__$s10A58Flow21BatchNorm_doInferenceyAA0A0VADF_bb0__PB__src_0_wrt_0VmD Original type: (metatype_type (struct_type decl=TensorFlow.(file)._AD__$s10TensorFlow21BatchNorm_doInferenceyAA0A0VADF_bb0__PB__src_0_wrt_0)) Please submit a bug report (https://swift.org/contributing/#reporting-bugs) and include the project and the crash backtrace. Stack dump: 0. Program arguments: /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2022-06-02-a.xctoolchain/usr/bin/swift-frontend -frontend -c /Users/philipturner/Desktop/fan/s4tf/Sources/TensorFlow/Core/Tensor.swift -primary-file /Users/philipturner/Desktop/fan/s4tf/Sources/TensorFlow/Layers/Normalization.swift -emit-dependencies-path /Users/philipturner/Desktop/fan/s4tf/.build/arm64-apple-macosx/debug/TensorFlow.build/Layers/Normalization.d -emit-reference-dependencies-path /Users/philipturner/Desktop/fan/s4tf/.build/arm64-apple-macosx/debug/TensorFlow.build/Layers/Normalization.swiftdeps -target arm64-apple-macosx10.10 -Xllvm -aarch64-use-tbi -enable-objc-interop -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk -I /Users/philipturner/Desktop/fan/s4tf/.build/arm64-apple-macosx/debug -I /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/usr/lib -F /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/Library/Frameworks -color-diagnostics -enable-testing -g -module-cache-path /Users/philipturner/Desktop/fan/s4tf/.build/arm64-apple-macosx/debug/ModuleCache -swift-version 5 -Onone -D SWIFT_PACKAGE -D DEBUG -new-driver-path /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2022-06-02-a.xctoolchain/usr/bin/swift-driver -empty-abi-descriptor -resource-dir /Library/Developer/Toolchains/swift-DEVELOPMENT-SNAPSHOT-2022-06-02-a.xctoolchain/usr/lib/swift -enable-anonymous-context-mangled-names -module-name TensorFlow -target-sdk-version 12.3 -parse-as-library -o /Users/philipturner/Desktop/fan/s4tf/.build/arm64-apple-macosx/debug/TensorFlow.build/Layers/Normalization.swift.o -index-store-path /Users/philipturner/Desktop/fan/s4tf/.build/arm64-apple-macosx/debug/index/store -index-system-modules 1. Apple Swift version 5.8-dev (LLVM 278d67f38c6a910, Swift ee312bc1e20eb01) 2. Compiling with the current language version 3. While evaluating request IRGenRequest(IR Generation for file "/Users/philipturner/Desktop/fan/s4tf/Sources/TensorFlow/Layers/Normalization.swift") 4. While emitting IR for synthesized file0x11a1d01d8 5. While emitting metadata for '_AD__$s10TensorFlow21BatchNorm_doInferenceyAA0A0VADF_bb0__PB__src_0_wrt_0' (in module 'TensorFlow') Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 swift-frontend 0x0000000106b823a4 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 56 1 swift-frontend 0x0000000106b81604 llvm::sys::RunSignalHandlers() + 128 2 swift-frontend 0x0000000106b82a08 SignalHandler(int) + 304 3 libsystem_platform.dylib 0x00000001a6a674a4 _sigtramp + 56 4 libsystem_pthread.dylib 0x00000001a6a4fee0 pthread_kill + 288 5 libsystem_c.dylib 0x00000001a698a340 abort + 168 6 swift-frontend 0x0000000106c5eed4 (anonymous namespace)::IRGenDebugInfoImpl::getOrCreateType(swift::irgen::DebugTypeInfo) (.cold.9) + 0 7 swift-frontend 0x0000000102a59950 (anonymous namespace)::IRGenDebugInfoImpl::getOrCreateType(swift::irgen::DebugTypeInfo) + 3788 8 swift-frontend 0x0000000102a54e80 swift::irgen::IRGenDebugInfo::emitGlobalVariableDeclaration(llvm::GlobalVariable*, llvm::StringRef, llvm::StringRef, swift::irgen::DebugTypeInfo, bool, bool, llvm::OptionalEnvironment (please complete the following information):