Open xctan opened 8 months ago
We enable everything by default if we can't detect availability of specific instructions... This is to encourage people to upgrade to at least oldest usable kernel and C library versions... Flipping the default to not enable everything would require the change for all architectures. We still allow disabling every optimisation to make it possible to compile for older kernels.
We are using this workaround based on the fix from highway
diff --git a/arch/riscv/riscv_features.c b/arch/riscv/riscv_features.c
index b066f42..259a63a 100644
--- a/arch/riscv/riscv_features.c
+++ b/arch/riscv/riscv_features.c
@@ -42,4 +42,20 @@ void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) {
riscv_check_features_runtime(features);
else
riscv_check_features_compile_time(features);
+ if (features->has_rvv) {
+ size_t e8m1_vec_len;
+ int64_t vtype_reg_val;
+ // Check that a vuint8m1_t vector is at least 16 bytes and that tail
+ // agnostic and mask agnostic mode are supported
+ //
+ __asm__ volatile(
+ "vsetvli %0, zero, e8, m1, ta, ma\n\t"
+ "csrr %1, vtype"
+ : "=r"(e8m1_vec_len), "=r"(vtype_reg_val));
+
+ // The RVV target is supported if the VILL bit of VTYPE (the MSB bit of
+ // VTYPE) is not set and the length of a vuint8m1_t vector is at least 16
+ // bytes
+ features->has_rvv = (vtype_reg_val >= 0 && e8m1_vec_len >= 16);
+ }
}
For 32 bit the vtype_reg_val
needs to be int32_t
, but I was lazy since Alpine only supports 64 bit riscv.
We are using this workaround based on the fix from highway
diff --git a/arch/riscv/riscv_features.c b/arch/riscv/riscv_features.c index b066f42..259a63a 100644 --- a/arch/riscv/riscv_features.c +++ b/arch/riscv/riscv_features.c @@ -42,4 +42,20 @@ void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) { riscv_check_features_runtime(features); else riscv_check_features_compile_time(features); + if (features->has_rvv) { + size_t e8m1_vec_len; + int64_t vtype_reg_val; + // Check that a vuint8m1_t vector is at least 16 bytes and that tail + // agnostic and mask agnostic mode are supported + // + __asm__ volatile( + "vsetvli %0, zero, e8, m1, ta, ma\n\t" + "csrr %1, vtype" + : "=r"(e8m1_vec_len), "=r"(vtype_reg_val)); + + // The RVV target is supported if the VILL bit of VTYPE (the MSB bit of + // VTYPE) is not set and the length of a vuint8m1_t vector is at least 16 + // bytes + features->has_rvv = (vtype_reg_val >= 0 && e8m1_vec_len >= 16); + } }
For 32 bit the
vtype_reg_val
needs to beint32_t
, but I was lazy since Alpine only supports 64 bit riscv.
When the type depends on if the target is 32-bit or 64-bit, the variable type should be intptr_t
, which maps on modern compilers to correct type depending on the target.
You should make a pull request with the patch, so it can be reviewed by contributors who use RISC-V.
@alexsifivetw Can you check the patch above, if it can be cleaned up for zlib-ng?
Hi @xctan
If you know that your env does not support RVV, you can try setting WITH_RVV=OFF
.
Please also check the RAEDME.
diff --git a/arch/riscv/riscv_features.c b/arch/riscv/riscv_features.c index b066f42..259a63a 100644 --- a/arch/riscv/riscv_features.c +++ b/arch/riscv/riscv_features.c @@ -42,4 +42,20 @@ void Z_INTERNAL riscv_check_features(struct riscv_cpu_features *features) { riscv_check_features_runtime(features); else riscv_check_features_compile_time(features); + if (features->has_rvv) { + size_t e8m1_vec_len; + int64_t vtype_reg_val; + // Check that a vuint8m1_t vector is at least 16 bytes and that tail + // agnostic and mask agnostic mode are supported + // + __asm__ volatile( + "vsetvli %0, zero, e8, m1, ta, ma\n\t" + "csrr %1, vtype" + : "=r"(e8m1_vec_len), "=r"(vtype_reg_val)); + + // The RVV target is supported if the VILL bit of VTYPE (the MSB bit of + // VTYPE) is not set and the length of a vuint8m1_t vector is at least 16 + // bytes + features->has_rvv = (vtype_reg_val >= 0 && e8m1_vec_len >= 16); + } }
@ncopa This patch looks useful. Could you make a PR with the patch? It's more convenient for discussion and allows RISC-V users to focus on it.
I'm not familiar with Rust and Cargo. It will install a prebuilt version, right? Could you specify the no-RVV version in Cargo? https://github.com/starship/starship/blob/37fba4cfb706533988404255e6226f37cdab13f9/Cargo.toml#L39
I'd rather see this fixed sooner than later, so if @ncopa doesn't want or have time to make a PR, I can try to make it even though I don't have RISC-V hardware to test it on...
It looks like the change in 2.2.2 from #1770, meant to fix RVV auto-detection on systems with older kernels (and thus possibly to fix this issue altogether?), is ineffective on at least some systems, because SIGILL now occurs during detection itself.
In 2.2.1:
Thread 1 "gix" received signal SIGILL, Illegal instruction.
0x0000002aab92ccb0 in adler32_rvv_impl (adler=4194302,
dst=0x2aab92e7860000 <error: Cannot access memory at address 0x2aab92e7860000>,
src=0x3ffffec0e00000 <error: Cannot access memory at address 0x3ffffec0e00000>, len=18014393758908416,
COPY=-1059061760)
at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.16/src/zlib-ng/arch/riscv/adler32_rvv.c:15
15 static inline uint32_t adler32_rvv_impl(uint32_t adler, uint8_t* restrict dst, const uint8_t *src, size_t len, int COPY) {
In 2.2.2:
Thread 1 "gix" received signal SIGILL, Illegal instruction.
riscv_check_features (features=0x3fffff55b8)
at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/arch/riscv/riscv_features.c:61
61 __asm__ volatile(
Full backtraces and context are available at https://github.com/rust-lang/libz-sys/issues/200#issuecomment-2365480281. The system is a 64-bit RISC-V machine without RVV support, hosted by Scaleway with a "system type" of EM-RV1-C4M16S128-A (with these specifications), and running a 5.10.113-scw1 kernel:
ubuntu@riscv:~$ lsb_release -d
No LSB modules are available.
Description: Ubuntu 24.04.1 LTS
ubuntu@riscv:~$ uname -a
Linux riscv 5.10.113-scw1 #1 SMP PREEMPT Fri Jul 12 15:31:22 UTC 2024 riscv64 riscv64 riscv64 GNU/Linux
ubuntu@riscv:~$ cat /sys/firmware/devicetree/base/cpus/cpu@0/riscv,isa
rv64imafdcsu
My guess is that the continued problem, in this new form, is not specific to one particular chip and also not specific to Scaleway builds of the kernel, but I am not certain. Furthermore, even if I am correct, it may be that this old of a kernel version is uncommonly used elsewhere; I am not sure about that either. I can attempt to emulate a separate RISC-V system without RVV support and produce the problem with a generic kernel, but since that would involve some delays and might not be necessary, I figured I'd report this first. Please let me know if there is further information I should provide.
I encountered this when using this library through the libz-sys Rust crate (itself appearing as a dependency of gitoxide). But I have also verified by running the zlib-ng test suite that a number of this project's own tests fail on the same system due to SIGILL (directly or indirectly).
All the tests that failed in 2.2.1 also failed in 2.2.2, but several more tests failed in 2.2.2. I presume this is because detection sometimes occurs when the results of the detection would not be used. Running the tests on the tip of the develop
branch had the same results as 2.2.2, as expected.
That is to say that, without passing any -DWITH_RVV
option, I always got many test failures. In contrast, when I used -DWITH_RVV=OFF
, I never got any test failures in any version. The procedure used was the one in the readme.
-cmake .
+cmake -DWITH_RVV=OFF .
cmake --build . --config Release
ctest --verbose -C Release
PR https://github.com/zlib-ng/zlib-ng/pull/1585 has added runtime detection of RISC-V vector support for kernels newer than or equal to 6.5, and if the kernel is too old, zlib-ng would use compile time compiler support as fallback. However, this behavior is not safe for older kernels in conjunction with new compilers. In my case, I was using Linux 6.1.61 and GCC 13.2.1 when trying to compile starship 1.18.1 with default features enabled, including zlib-ng. Some git test cases of starship would fail because of the SIGILL signal coming from
adler32_rvv
.I suggest using a more conservative assumption about runtime support of RVV when hwcap is not available. When the kernel is too old, or hwcap is disabled by the sandbox mechanism, RVV support should be disabled.