Open aiusepsi opened 3 months ago
Thanks for the report Looking at the original version, it used to be floats but was recently changed to int. I found the change that did it, but its not documented why. I'm inclined to go ahead and change it back. If you like you can try it on your end to ensure it works and if any other changes have the same issue
// Copyright 2019 Google LLC // // This source code is licensed under the BSD-style license found in the // LICENSE file in the root directory of this source tree.
// Table of exp2(k / 64) values, k = 0..63 XNN_INTERNAL const float xnn_table_exp2_k_over_64[64] = { 0x1.000000p+0f, 0x1.02C9A4p+0f, 0x1.059B0Ep+0f, 0x1.087452p+0f, 0x1.0B5586p+0f, 0x1.0E3EC4p+0f, 0x1.11301Ep+0f, 0x1.1429AAp+0f, 0x1.172B84p+0f, 0x1.1A35BEp+0f, 0x1.1D4874p+0f, 0x1.2063B8p+0f, 0x1.2387A6p+0f, 0x1.26B456p+0f, 0x1.29E9E0p+0f, 0x1.2D285Ap+0f, 0x1.306FE0p+0f, 0x1.33C08Cp+0f, 0x1.371A74p+0f, 0x1.3A7DB4p+0f, 0x1.3DEA64p+0f, 0x1.4160A2p+0f, 0x1.44E086p+0f, 0x1.486A2Cp+0f, 0x1.4BFDAEp+0f, 0x1.4F9B28p+0f, 0x1.5342B6p+0f, 0x1.56F474p+0f, 0x1.5AB07Ep+0f, 0x1.5E76F2p+0f, 0x1.6247ECp+0f, 0x1.662388p+0f, 0x1.6A09E6p+0f, 0x1.6DFB24p+0f, 0x1.71F75Ep+0f, 0x1.75FEB6p+0f, 0x1.7A1148p+0f, 0x1.7E2F34p+0f, 0x1.82589Ap+0f, 0x1.868D9Ap+0f, 0x1.8ACE54p+0f, 0x1.8F1AEAp+0f, 0x1.93737Cp+0f, 0x1.97D82Ap+0f, 0x1.9C4918p+0f, 0x1.A0C668p+0f, 0x1.A5503Cp+0f, 0x1.A9E6B6p+0f, 0x1.AE89FAp+0f, 0x1.B33A2Cp+0f, 0x1.B7F770p+0f, 0x1.BCC1EAp+0f, 0x1.C199BEp+0f, 0x1.C67F12p+0f, 0x1.CB720Ep+0f, 0x1.D072D4p+0f, 0x1.D5818Ep+0f, 0x1.DA9E60p+0f, 0x1.DFC974p+0f, 0x1.E502EEp+0f, 0x1.EA4AFAp+0f, 0x1.EFA1BEp+0f, 0x1.F50766p+0f, 0x1.FA7C18p+0f, };
I found int and int32_t variations of this, and fixed those. Now its just the float extern that is sometimes used.
grep extern.*xnn_table_exp2minus_kover . -r -h | sort | uniq These 3 cause type mismatch: extern XNN_INTERNAL const float xnn_table_exp2minus_k_over_16[16]; extern XNN_INTERNAL const float xnn_table_exp2minus_k_over_2048[2048]; extern XNN_INTERNAL const float xnn_table_exp2minus_k_over_64[64]; These are extern'ed consistently: extern XNN_INTERNAL const uint32_t xnn_table_exp2minus_kover${LUT}[${LUT}]; extern XNN_INTERNAL const uint32_t xnn_table_exp2minus_k_over_16[16]; extern XNN_INTERNAL const uint32_t xnn_table_exp2minus_k_over_2048[2048]; extern XNN_INTERNAL const uint32_t xnn_table_exp2minus_k_over_32[32]; extern XNN_INTERNAL const uint32_t xnn_table_exp2minus_k_over_4[4]; extern XNN_INTERNAL const uint32_t xnn_table_exp2minus_k_over_64[64]; extern XNN_INTERNAL const uint32_t xnn_table_exp2minus_k_over_8[8];
If you like you can try it on your end to ensure it works and if any other changes have the same issue
@fbarchard did you ever commit those changes? I don't see e.g. any links to commits that fixed it.
I tried to test a recent snapshot. The previous one where I saw the inconsistent types was from 2024.02.29. Trying the latest version I actually cannot even get far enough to hit errors with LTO, as I get tons of -Werror=strict-aliasing
in src/xnnpack/simd/f32-scalar.h first.
e.g.
In file included from /var/tmp/portage/sci-libs/XNNPACK-2024.09.04/work/XNNPACK-35af48543c1666feb840f74560eb5b9bfdf9b348/src/f32-vunary/gen/f32-vneg-scalar.c:14:
/var/tmp/portage/sci-libs/XNNPACK-2024.09.04/work/XNNPACK-35af48543c1666feb840f74560eb5b9bfdf9b348/src/xnnpack/simd/f32-generic-functions.h: In function ‘xnn_generic_getexp_f32’:
/var/tmp/portage/sci-libs/XNNPACK-2024.09.04/work/XNNPACK-35af48543c1666feb840f74560eb5b9bfdf9b348/src/xnnpack/simd/f32-scalar.h:28:31: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
28 | const xnn_simd_f32_t var = *(const float *)&_##var##_int_value;
| ^~~~~~~~~~~~~~~~~
/var/tmp/portage/sci-libs/XNNPACK-2024.09.04/work/XNNPACK-35af48543c1666feb840f74560eb5b9bfdf9b348/src/xnnpack/simd/f32-generic-functions.h:34:3: note: in expansion of macro ‘XNN_SIMD_CONST_U32’
34 | XNN_SIMD_CONST_U32(exp_mask, 0x7f800000);
| ^~~~~~~~~~~~~~~~~~
/var/tmp/portage/sci-libs/XNNPACK-2024.09.04/work/XNNPACK-35af48543c1666feb840f74560eb5b9bfdf9b348/src/xnnpack/simd/f32-scalar.h: In function ‘xnn_and_f32’:
/var/tmp/portage/sci-libs/XNNPACK-2024.09.04/work/XNNPACK-35af48543c1666feb840f74560eb5b9bfdf9b348/src/xnnpack/simd/f32-scalar.h:87:25: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
87 | const uint32_t res = *(const uint32_t *)&a & *(const uint32_t *)&b;
| ^~~~~~~~~~~~~~~~~~~~
Attached is a full log of the entire failure: build.log
https://github.com/pytorch/pytorch/pull/137866 for has some runs with full stack traces and repros if you need some more reproducers. Here is the PyTorch HUD: https://hud.pytorch.org/pr/pytorch/pytorch/137866#31470457008
@eli-schwartz Those errors still exist on master so they were not committed.
@fbarchard Posted full build logs and happy to try master when they are fixed, I posted the full error logs in my above comment. Simple search suggests the problem still appears on master.
@fbarchard Found the PRs you made ie: https://github.com/google/XNNPACK/pull/6832, unfortunately converting them to uint32 is still causing issues with LTO with GCC even if they are the same width.
The type of xnn_table_exp2minus_k_over_16 varies between declarations.
It's defined as
uint32_t[64]
here but other declarations have various different types.For example, here it's declared as
float[64]
.This is illegal in C as uint32_t and float are not compatible types (just being the same size isn't sufficient to be compatible) so strictly speaking this is undefined behaviour. I hit an error with this while compiling with LTO, so the work-around is probably "don't do that", but it is still incorrect even if it happens to work.
One possible solution is to declare the array as float[64], and use hexadecimal float literals, e.g. 0x1p+0 instead of 0x3F800000, 0x1.fec9a4p-1 instead of 0x3F7F64D2, etc., which should still be bit-for-bit identical.