Closed ESI-SYD closed 1 year ago
@EikanWang @jgong5 could you help take a look? Not sure if this issue is due to a change on mkldnn side, inductor backend for cpu, or what.
@EikanWang @jgong5 could you help take a look? Not sure if this issue is due to a change on mkldnn side, inductor backend for cpu, or what.
Yes, we will look into these issues. These issues were found from our regular benchmark testing and @ESI-SYD helped to report this here.
@XiaobingSuper The execution flow is quite similar to the issue I found from https://github.com/pytorch/pytorch/issues/90652. mkldnn._convolution_pointwise.binary
doesn't support meta tensor and the run_fallback_kernel
is invoked with real tensors on it. Seems the non-tensor args are not initialized properly and triggers assertion failure. A straightforward fix is to support meta tensors on these fusion ops. I'm also not sure if the flow with run_fallback_kernel
can be improved to have non-tensor args initialized properly.
This issue can be fixed by https://github.com/pytorch/pytorch/pull/90259.
https://github.com/pytorch/pytorch/pull/90259 has been merged, close it now.
🐛 Describe the bug
This failure found in the latest TorchInductor CPU Performance Dashboard refresh test with below error log (same crash applies to 3 models; sebotnet33ts_256 eca_halonext26ts eca_botnext26ts_256 bug
SW information
Error logs
Minified repro