tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
182.91k stars 73.92k forks source link

Disable numerically unstable cuDNN engine 14 for certain convolutions #67143

Closed copybara-service[bot] closed 1 week ago

copybara-service[bot] commented 1 week ago

Disable numerically unstable cuDNN engine 14 for certain convolutions

A customer reported a numerical issue on V100 with cuDNN 9 that I as was able to track down to a single convolution and cuDNN engine.

Using random data with this convolution gives me reasonable results, so it's not yet clear to me if this is a cuDNN issue or just unfortunate numerics.

For the time being let's disable the algorithm for this one convolution. The customer confirmed that it is fixing their issue.