oneapi-src / oneDNN

oneAPI Deep Neural Network Library (oneDNN)
https://uxlfoundation.org
Apache License 2.0
3.58k stars 985 forks source link

Recent benchdnn matmul tolerence change is failing on aarch64 #2089

Open Sqvid opened 1 week ago

Sqvid commented 1 week ago

Summary

The recent change introduced by a8b478b21f7240caa4d68d2b5aee88b54bbd3092 causes a failure on aarch64.

Version

oneDNN v3.7.0 (commit a8b478b21f7240caa4d68d2b5aee88b54bbd3092)

Environment

Steps to reproduce

Failure only appears in Release build.

ONEDNN_VERBOSE=all ./build/tests/benchdnn/benchdnn --matmul --skip-impl=ref --dt=s8:s8:f32 --stag=ab --wtag=ab --dtag=ab --bia_dt=u8 --attr-scales=src:common:0.25+dst:common:2.25+wei:common:0.5 --attr-zero-points=src:common:1+dst:common:2+wei:common:-1 --attr-post-ops=sum 1x30:30x20

Observed behavior

onednn_verbose,v1,info,oneDNN v3.7.0 (commit a8b478b21f7240caa4d68d2b5aee88b54bbd3092)
onednn_verbose,v1,info,cpu,runtime:OpenMP,nthr:32
onednn_verbose,v1,info,cpu,isa:AArch64 SVE (256 bits)
onednn_verbose,v1,info,gpu,runtime:none
onednn_verbose,v1,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,brg:sve_512,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported isa,src/cpu/aarch64/matmul/brgemm_matmul.cpp:98
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,lowp_gemm:acl,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,scale and zero-point for f32 dst unsupported,src/cpu/aarch64/matmul/acl_lowp_matmul.cpp:90
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:acl,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/aarch64/matmul/acl_matmul.cpp:84
onednn_verbose,v1,primitive,create:dispatch,brgemm_matmul,datatype configuration not supported on this isa,src/cpu/aarch64/matmul/brgemm_matmul_utils.cpp:735
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:jit:f32,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/matmul/gemm_f32_matmul.cpp:93
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:jit:bf16,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/matmul/gemm_bf16_matmul.cpp:63
onednn_verbose,v1,primitive,create:dispatch,matmul,cpu,matmul,gemm:jit:bf16,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:any:any::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,unsupported datatype combination,src/cpu/matmul/gemm_bf16_matmul.cpp:63
onednn_verbose,v1,primitive,create:cache_miss,cpu,matmul,gemm:jit,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,0.0268555
onednn_verbose,v1,primitive,create:cache_hit,cpu,matmul,gemm:jit,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,0.00195312
onednn_verbose,v1,primitive,create:check,matmul,unsupported attribute,src/common/matmul.cpp:75
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.0151367
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.720947
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:u8::blocked:ab::f0,,,1x20,0.0959473
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:u8::blocked:ab::f0,,,1x20,0.00390625
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,30x20,0.0600586
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,30x20,0.000976562
onednn_verbose,v1,primitive,create:cache_miss,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,1x30,0.032959
onednn_verbose,v1,primitive,exec,cpu,reorder,jit:uni,undef,src:f32::blocked:ab::f0 dst:s8::blocked:ab::f0,,,1x30,0
onednn_verbose,v1,primitive,exec,cpu,matmul,gemm:jit,undef,src:s8::blocked:ab::f0 wei:s8::blocked:ab::f0 bia:u8:a:blocked:ab::f0_mask2 dst:f32::blocked:ab::f0,attr-scales:src0:0:f32+dst:0:f32+wei:0:f32 attr-zero-points:src0:0:s32+wei:0:s32+dst:0:s32 attr-post-ops:sum,,1x30:30x20,1.21704
onednn_verbose,v1,primitive,create:cache_hit,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.00195312
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.078125
onednn_verbose,v1,primitive,create:cache_hit,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.00195312
onednn_verbose,v1,primitive,exec,cpu,reorder,simple:any,undef,src:f32::blocked:ab::f0 dst:f32::blocked:ab::f0,,,1x20,0.236084
[   6][DST][0:6] exp_f32:-1.49012e-08 exp:-1.49012e-08 got:           0 diff:1.49012e-08 rdiff:       1
[COMPARE_STATS][DST]: trh=0 err_max_diff:1.49012e-08 err_max_rdiff:       1 all_max_diff:4.76837e-07 all_max_rdiff:       1
0:FAILED (errors:1 total:20) __REPRO: --matmul --skip-impl=ref --dt=s8:s8:f32 --stag=ab --wtag=ab --dtag=ab --bia_dt=u8 --attr-scales=src:common:0.25+dst:common:2.25+wei:common:0.5 --attr-zero-points=src:common:1+dst:common:2+wei:common:-1 --attr-post-ops=sum 1x30:30x20
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.01s; fill: 0.00s (6%); compute_ref: 0.00s (5%); compare: 0.00s (9%);

Expected behavior

The returned value of 0 seems reasonably close to the expected 1.49012e-08. Could you share the rational behind changing the threshold? Thank you.

@dzarukin

dzarukin commented 1 week ago

The rational behind changing threshold is that matmul op uses integer filling and should provide the precise answer on precise inputs. If there are features that affect that statement, the threshold is adjusted accordingly.

Based on the output value, it looks like a cancellation happening. The first thing to try would be changing inexact 2.25 to exact 2 or 4. It will likely resolve the problem. If it helps, I'll proceed with purging this scale value from input files. If it doesn't, will need your help to figure out where exactly the difference is coming from. Thanks.

Sqvid commented 1 week ago

Thanks for replying so quickly. Your suggestion seems spot-on, as changing the parameter to -attr-scales=src:common:0.25+dst:common:2+wei:common:0.5 does pass the test. I'd appreciate if you did go ahead and purge the scale value, if you're happy with that. Thanks.

Sqvid commented 11 hours ago

@dzarukin Were you able to look into purging the test cases? Thanks.

dzarukin commented 9 hours ago

@dzarukin Were you able to look into purging the test cases? Thanks.

Hi @Sqvid, the change is under review, should land some time this week. Thank you.

Sqvid commented 8 hours ago

I see, thanks for the update