Closed desertfire closed 1 year ago
Ok, bisecting points to https://github.com/pytorch/pytorch/pull/87492. https://github.com/pytorch/pytorch/pull/90746 reverts it.
To reproduce:
for i in {1..20}; do python benchmarks/dynamo/huggingface.py --training --accuracy --device cuda --amp --only AlbertForQuestionAnswering --ci --backend aot_inductor_debug; done
Note the problem exists in aot_inductor_debug
but not aot_eager
, so likely a decomposition issue.
I've narrowed the issue to the following decomps:
layernorm tanh_backwards tanh softmax
Resetting the RNG in the huggingface models removes the flakiness. I'm looking into what to do (if anything) about the decomps, since sofmax + ln are crucial for performance.
Flakiness fixed by https://github.com/pytorch/pytorch/pull/90936
https://hud.pytorch.org/hud/pytorch/pytorch/master/1?per_page=50&name_filter=inductor