google / XNNPACK

High-efficiency floating-point neural network inference operators for mobile, server, and Web
Other
1.83k stars 346 forks source link

Load-from-misaligned-address failures on Hexagon simulator #6306

Open steven-johnson opened 5 months ago

steven-johnson commented 5 months ago

Several tests fail when built for Hexagon and run under the simulator; the QuRT exit code (0x2001) indicates the failures are load from misaligned addresses:

test/qs8_dwconv_minmax_multipass_fp32_test
test/qs8_qc8w_dwconv_minmax_multipass_fp32_test
test/qu8_dwconv_minmax_multipass_fp32_test

(Haven't tested on hardware yet, will update this bug once I've done so.)

steven-johnson commented 5 months ago

Update: these fail in similar ways on Samsung S22.

fbarchard commented 5 months ago

8 bit (or 4 bit) weights can cause an alignment issue for bias and scale that are 32 bit elements and usually vectors. dwconv is an igemm. igemm is a gemm, but also has MR pointers embedded in the weights. A (i)gemm has NR int32 bias values An igemm then has MR pointers A gemm/igemm has NR*KC weights A gemm/igemm has NR floats and optional float bias If the number of weights is odd, the bias, indirect pointers and scale can be unaligned.

For packw kernel the hexagon crashed on the int32 bias when the kernel size is smaller than 4 bytes. The work around was https://github.com/google/XNNPACK/pull/6303 that added an attribute to the pointers to allow unaligned stores. But that is slow.

I think if hexagon kernels carefully use sizes that are at least a multiple of 4 bytes, we can use float and int values with memw instead of memb. For vectors it wont always be possible. An IGEMM (or dwconv) has MR pointers before the weights, so MR would need to be large or padded to ensure vector aligned values. For gemm is should be possible, if the NR is a multiple of vector size.

fbarchard commented 5 months ago

If the multipass specifically has the issue but single pass works, its likely the temporary accumulation buffer is not int32 aligned.