tensorflow / swift

Swift for TensorFlow
https://tensorflow.org/swift
Apache License 2.0
6.12k stars 608 forks source link

Autodiff runtime error in an async DispatchQueue #500

Closed porterchild closed 4 years ago

porterchild commented 4 years ago

I'm struggling to find a minimal reproducer for a runtime error on the backwards pass of some differentiable code. It's a large bit of code, hence the struggle to find a small reproducer.

The interesting information that I have is this: when the backwards pass is run in a normal scope, it runs fine, but when it is run within a DispatchQueue.async{} block, there is a runtime EXC_BAD_ACCESS error.

I'm wondering if anyone has a hypothesis that might help my hunting. Thanks!

porterchild commented 4 years ago

Can you not just use Xcode's Target -> Edit Scheme -> Diagnostics -> AddressSanitizer when on S4TF toolchains?

dan-zheng commented 4 years ago

Have you tried using ThreadSanitizer to check for data races?

I'm not sure anyone is familiar with issues regarding autodiff and Dispatch, so a reproducer would be useful for us to help with debugging.

porterchild commented 4 years ago

I've tried with Target -> Edit Scheme -> Diagnostics -> ThreadSanitizer in Xcode, though I'm not certain that is actually working on the S4TF toolchain. Is there a special way to use the sanitizers with custom toolchains?

dan-zheng commented 4 years ago

I believe Target -> Edit Scheme -> Diagnostics -> ThreadSanitizer in Xcode just works with custom toolchains, nothing special needed. Verified here:

verification

Do AddressSanitizer and ThreadSanitizer pass for your code with the runtime autodiff error?

porterchild commented 4 years ago

Ok good to know, thanks! Yeah they both pass, insofar as the crash happens before either of them report anything.

porterchild commented 4 years ago

Reproducer provided in https://bugs.swift.org/browse/TF-1311