pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.85k stars 22.61k forks source link

[inductor][cpu] detectron2 fasterrcnn accuracy failure #112566

Closed WeizhuoZhang-intel closed 8 months ago

WeizhuoZhang-intel commented 1 year ago

new_failures in 2023-10-29

name accuracy perf suite reason(reference only)
* * * * *
* * * * *
detectron2_fasterrcnn_r_101_dc5 X torchbench detectron2_fasterrcnn_r_101_dc5, [2023-10-30 22:38:14 498] torch._dynamo.convert_frame: [WARNING] to diagnose recompilation issues set env variable TORCHDYNAMO_REPORT_GUARD_FAILURES=1 and also see https://pytorch.org/docs/master/compile/troubleshooting.html.
detectron2_fasterrcnn_r_101_fpn X torchbench detectron2_fasterrcnn_r_101_fpn, [2023-10-30 22:38:14 498] torch._dynamo.convert_frame: [WARNING] to diagnose recompilation issues set env variable TORCHDYNAMO_REPORT_GUARD_FAILURES=1 and also see https://pytorch.org/docs/master/compile/troubleshooting.html.
detectron2_fasterrcnn_r_50_c4 X torchbench detectron2_fasterrcnn_r_50_c4, [2023-10-30 22:38:14 498] torch._dynamo.convert_frame: [WARNING] to diagnose recompilation issues set env variable TORCHDYNAMO_REPORT_GUARD_FAILURES=1 and also see https://pytorch.org/docs/master/compile/troubleshooting.html.
detectron2_fasterrcnn_r_50_dc5 X torchbench detectron2_fasterrcnn_r_50_dc5, [2023-10-30 22:38:14 498] torch._dynamo.convert_frame: [WARNING] to diagnose recompilation issues set env variable TORCHDYNAMO_REPORT_GUARD_FAILURES=1 and also see https://pytorch.org/docs/master/compile/troubleshooting.html.
detectron2_fasterrcnn_r_50_fpn X torchbench detectron2_fasterrcnn_r_50_fpn, [2023-10-30 22:38:14 498] torch._dynamo.convert_frame: [WARNING] to diagnose recompilation issues set env variable TORCHDYNAMO_REPORT_GUARD_FAILURES=1 and also see https://pytorch.org/docs/master/compile/troubleshooting.html.

SW info

SW Nightly commit Main commit
Pytorch 0a16ad0 f5088d2
Torchbench / 7617d3f5
torchaudio 475b6ae ede4309
torchtext 142d029 45e4b8c
torchvision 8636bf3 4ac707a
torchdata eb9bf61 d76d92c
dynamo_benchmarks e6efc29 /

Reference SW info(nightly)

item commit
torchbench 7617d3f5
torch 2.2.0a0+gite6efc29
torchvision 0.16.0a0+8636bf3
torchtext 0.16.0a0+142d029
torchaudio 2.2.0a0+475b6ae
torchdata 0.7.0a0+eb9bf61
dynamo_benchmarks 0200b11

image: docker pull ccr-registry.caas.intel.com/pytorch/pt_inductor:2023_10_30_aws

Repro

inductor_single_run.sh bash inductor_single_run.sh single inference accuracy torchbench detectron2_fasterrcnn_r_101_dc5 float32 first static default 0

Suspected guilty commit

torchbench-detectron2_fasterrcnn_r_101_dc5-inference-float32-static-default-accuracy-single-crash_guilty_commit.log

cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @wconstab @bdhirsh @anijain2305

davidberard98 commented 1 year ago

@WeizhuoZhang-intel "the suspected guilty commit" points to this issue, can you update it if you know the cause?

leslie-fang-intel commented 1 year ago

Hi @WeizhuoZhang-intel, please help to provide the guilty commit of this issue. Current link you provided points to this issue itself. cc @chuanqi129

chuanqi129 commented 11 months ago

Suspected guilty commit: e38347f490ae14bf96913a19e7dab9b5e752c276 torchbench-detectron2_fasterrcnn_r_101_dc5-inference-float32-static-default-accuracy-single-crash_guilty_commit.log

inductor_single_run.sh bash inductor_single_run.sh single inference accuracy torchbench detectron2_fasterrcnn_r_101_dc5 float32 first static default 0

Le-Zheng commented 11 months ago

@Chillee we found commit: e38347f490ae14bf96913a19e7dab9b5e752c276 cause this issue. Could you please take a look on this issue?

leslie-fang-intel commented 10 months ago

@Chillee Could you help to take a look of this issue?

leslie-fang-intel commented 10 months ago

@Chillee Could you help to take a look of this issue?

penguinwu commented 8 months ago

@Le-Zheng -- Are the tests still failing on the latest trunk?

leslie-fang-intel commented 8 months ago

@Le-Zheng -- Are the tests still failing on the latest trunk?

@chuanqi129 @zxd1997066 could you help to check the latest test report?

chuanqi129 commented 8 months ago

Double checked the latest test report, the issue has been fixed