Open EliahKagan opened 3 months ago
Thanks for the detailed report and for providing enough details to reproduce the issue.
To me it seems that libz-ng
might be the place to actually get a solution for the issue if it gets prioritized and is indeed the same.
This crate probably doesn't have the means to affect it, except for changing the version of the libz-ng
source code.
Lastly, gitoxide
probably has a fix already by using a different zlib implementation, at least on the platforms that are likely to fail, if known in advance.
To me it seems that
libz-ng
might be the place to actually get a solution for the issue if it gets prioritized and is indeed the same.
Yes. Assuming this is https://github.com/zlib-ng/zlib-ng/issues/1705, which I believe to be the case, there is a patch that has been reported to work, though as far as I know it has not yet been offered as a PR. This is as detailed in https://github.com/zlib-ng/zlib-ng/issues/1705#issuecomment-2177455106 and preceding comments.
This crate probably doesn't have the means to affect it, except for changing the version of the
libz-ng
source code.
When building this crate, is there a way to pass configuration variables to cmake
for libz-ng
, as if by passing -D ...
to cmake
? As noted in https://github.com/zlib-ng/zlib-ng/issues/1705#issuecomment-2177453721, -D WITH_RVV=OFF
can be passed when cross-compiling zlib-ng
for RISC-V to prevent vector instructions requiring RVV support from being emitted, and this should be effective as a workaround for incorrect detection when not cross-compiling.
This would provide a way to make this crate work while neither changing the source code of libz-ng
or using a different zlib implementation. It would also allow me to confirm to an even greater degree of certainty that this issue really is a straightforward case of https://github.com/zlib-ng/zlib-ng/issues/1705 and nothing more.
If this is not feasible, or if it is feasible but not easy, then should a feature be added here for it, or something? (But maybe this really is quite easy and I am just not aware of it.)
Lastly,
gitoxide
probably has a fix already by using a different zlib implementation, at least on the platforms that are likely to fail, if known in advance.
Achieving the effect of -D WITH_RVV=OFF
, if there is way to do it when cmake
is run during a build of this crate, should likewise allow gitoxide
to work. Furthermore, even once https://github.com/zlib-ng/zlib-ng/issues/1705, this would be helpful for actual cross compilation, such as if we make binary releases for RISC-V.
When building this crate, is there a way to pass configuration variables to
cmake
forlibz-ng
, as if by passing-D ...
tocmake
? As noted in zlib-ng/zlib-ng#1705 (comment),-D WITH_RVV=OFF
can be passed when cross-compilingzlib-ng
for RISC-V to prevent vector instructions requiring RVV support from being emitted, and this should be effective as a workaround for incorrect detection when not cross-compiling.
A great idea, this is absolutely possible!
If this is not feasible, or if it is feasible but not easy, then should a feature be added here for it, or something? (But maybe this really is quite easy and I am just not aware of it.)
The build.rs
script for zlib-ng
should allow to detect that case and conditionally pass a flag to the cmake invocation. It should be easy enough if there is the right test-system available.
Achieving the effect of
-D WITH_RVV=OFF
, if there is way to do it whencmake
is run during a build of this crate, should likewise allowgitoxide
to work. Furthermore, even once zlib-ng/zlib-ng#1705, this would be helpful for actual cross compilation, such as if we make binary releases for RISC-V.
Now I have hopes that this can be fixed here, and if you say it will remain useful even if zlib-ng
ships their fix, that's even better.
Thanks again for all your great work!
The
build.rs
script forzlib-ng
should allow to detect that case and conditionally pass a flag to the cmake invocation. It should be easy enough if there is the right test-system available.
When compiling binaries to be distributed or otherwise to be run on another system, one may want to insist that RVV instructions be emitted, or insist that they not be emitted. Especially since an upstream fix should eventually take care of the auto-detection case, I think it would be best for a change here to allow it to be manually overridden.
This would not necessarily preclude implementing auto-detection here as well. But the appraisal of "easy enough" might be an underestimate, considering that the difficulty and subtleties involved in checking this are the cause of the upstream bug. (Detection is attempted, but the result is not always correct.) Furthermore, manually overriding it is really the capability that would remain useful here even after the upstream bug is fixed.
Since arbitrary logic, including logic specific to building particular architectures, could go in builds.rs
or modules in zng
that it uses, the manual override could be by an environment variable or other external mechanism. But I wonder if doing it by feature would still be better.
Is it acceptable if I add rvv-off
and rvv-on
features, or something like that?
(Auto-detection done here could then be by an rvv-auto
feature, if later added. This would remain distinct from when no rvv-*
feature is enabled, which would use the upstream detection.)
Thanks for the update.
Is it acceptable if I add
rvv-off
andrvv-on
features, or something like that?
Yes, I'd also do it with a cargo feature, which would also have to be additive. Of course it's possible to add logic to allow --all-features
builds that don't clash, but ideally this could be so minimal that it fixes the most anticipated usecase.
I've noticed that this project introduces the nonstandard configuration name zng
, for which warnings are sometimes issued. (To be clear, that doesn't show an effective test run, it's just an example of the warnings. The warnings can also be seen on CI.) That is not specific to RISC-V.
Why was this done for that, rather than using a feature? Does it mean I should consider introducing other such nonstandard configuration names for overriding RVV detection, rather than using features? That seems like a wrong thing to do, but since I am not clear on why it was done for zng
, I am not certain. Or should it not have been done for zng
either, and should that be changed?
Of course it's possible to add logic to allow
--all-features
builds that don't clash, but ideally this could be so minimal that it fixes the most anticipated usecase.
Do you mean that the added features should be minimal and thus not try to support --all-features
on RISC-V, or that they should try to support --all-features
even on RISC-V where it would feel contradictory if the features are rvv-off
and rvv-on
but that this should be done in a minimal way?
I think that, outside of architectures where it makes a difference, the new features could be no-ops and supplying them together could be permitted, while prohibiting it on RISC-V. Whether that's the best way to do it, I am not sure.
Why was this done for that, rather than using a feature? Does it mean I should consider introducing other such nonstandard configuration names for overriding RVV detection, rather than using features? That seems like a wrong thing to do, but since I am not clear on why it was done for
zng
, I am not certain. Or should it not have been done forzng
either, and should that be changed?
I'd hope Git has information on this, as I myself joined late enough to not know anything on how things came to be, unfortunately.
Do you mean that the added features should be minimal and thus not try to support
--all-features
on RISC-V, or that they should try to support--all-features
even on RISC-V where it would feel contradictory if the features arervv-off
andrvv-on
but that this should be done in a minimal way?
All cargo-features should be additive so that --all-features
will work and have predictable results. In practice, that's not always possible, but by using just a single RVV-related flag it should naturally be additive.
I also think it would do nothing outside of its applicable platform.
Also, I'd approach this as a band-aid to fix one specific problem, and not try to exhaustively solve every conceivable use-case, to keep it simple.
I'd hope Git has information on this, as I myself joined late enough to not know anything on how things came to be, unfortunately.
I'll look into it and let you know what I find.
[...] Also, I'd approach this as a band-aid to fix one specific problem, and not try to exhaustively solve every conceivable use-case, to keep it simple.
I'll make just a feature to force RVV to be turned off, since that's the specific problem right now.
I've opened #218 to add the rvv-off
feature discussed above, though further changes may be needed, for the reasons presented there.
zlib-ng 2.2.2, which includes the changes from https://github.com/zlib-ng/zlib-ng/pull/1770 that attempt to fix or mitigate https://github.com/zlib-ng/zlib-ng/issues/1705, has been released.
2.2.2 also carries other improvements, including https://github.com/zlib-ng/zlib-ng/pull/1773 (for https://github.com/zlib-ng/zlib-ng/issues/1772). So I'll open a PR to bump the submodule version shortly. [Edit: I have opened #219 for this.] Because the zlib-ng submodule of this project is currently at 2.2.1 (since #211), and going from 2.2.1 to 2.2.2 is a patch change, I expect upgrading to be straightforward, and building locally suggests that no further changes may be needed. To be clear, I am not sure whether new versions of this project's crates should be released at this time, which I think could be decided separately. It seems to me that bumping the submodule to the 2.2.2 tag may be useful either way.
But it doesn't look like that fully fixes https://github.com/zlib-ng/zlib-ng/issues/1705, and in particular it does not appear to fix this issue on the RISC-V test machine I am using. It does change the problem though. With zlib-ng 2.2.1, the crash occurs in adler32_rvv_impl
, in /arch/riscv/adler32_rvv.c
. With zlib-ng 2.2.2, the crash instead occurs in riscv_check_features
, in arch/riscv/riscv_features.c
. The change to how the check is performed, which I presume may improve it on many or most systems, actually causes the check itself to fail with SIGILL
on at least some systems, including the system I am testing on.
With zlib-ng 2.2.1:
Thread 1 "gix" received signal SIGILL, Illegal instruction.
0x0000002aab92ccb0 in adler32_rvv_impl (adler=4194302,
dst=0x2aab92e7860000 <error: Cannot access memory at address 0x2aab92e7860000>,
src=0x3ffffec0e00000 <error: Cannot access memory at address 0x3ffffec0e00000>, len=18014393758908416,
COPY=-1059061760)
at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.16/src/zlib-ng/arch/riscv/adler32_rvv.c:15
15 static inline uint32_t adler32_rvv_impl(uint32_t adler, uint8_t* restrict dst, const uint8_t *src, size_t len, int COPY) {
With zlib-ng 2.2.2:
Thread 1 "gix" received signal SIGILL, Illegal instruction.
riscv_check_features (features=0x3fffff55b8)
at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/arch/riscv/riscv_features.c:61
61 __asm__ volatile(
When I wrote this issue description, I showed a gdb
backtrace for gix --trace clone git@github.com:EliahKagan/gitoxide.git
. But a simpler way to produce the problem is with gix status
. The SIGILL
crashes and associated backtraces in gdb
are shown below with that command, both for 2.2.1 (to allow comparison and verify the similarity to the more complicated command) and 2.2.2.
To test 2.2.2, I used the same procedure as in #218--in particular, I used the modified cargo-zng
script as described there--but with the submodule bumped to the commit tagged 2.2.2 (and otherwise the same code as on the main branch), and without setting RVV_OFF
or any other environment variables (which would not be recognized on this branch anyway, but could be confusing).
Full backtrace with zlib-ng 2.2.1:
(gdb) run status
Starting program: /home/ubuntu/repos/gitoxide/target/debug/gix status
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for system-supplied DSO at 0x3ff7fd8000
Downloading separate debug info for /lib/riscv64-linux-gnu/libssl.so.3
Downloading separate debug info for /lib/riscv64-linux-gnu/libcrypto.so.3
Downloading separate debug info for /lib/riscv64-linux-gnu/libz.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
[New Thread 0x3ff793df00 (LWP 31410)]
[New Thread 0x3ff7691f00 (LWP 31411)]
[New Thread 0x3ff7490f00 (LWP 31412)]
Thread 1 "gix" received signal SIGILL, Illegal instruction.
0x0000002aab92ccb0 in adler32_rvv_impl (adler=4194302,
dst=0x2aab92e7860000 <error: Cannot access memory at address 0x2aab92e7860000>,
src=0x3ffffec0e00000 <error: Cannot access memory at address 0x3ffffec0e00000>, len=18014393758908416,
COPY=-1059061760)
at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.16/src/zlib-ng/arch/riscv/adler32_rvv.c:15
15 static inline uint32_t adler32_rvv_impl(uint32_t adler, uint8_t* restrict dst, const uint8_t *src, size_t len, int COPY) {
(gdb) bt
#0 0x0000002aab92ccb0 in adler32_rvv_impl (adler=4194302,
dst=0x2aab92e7860000 <error: Cannot access memory at address 0x2aab92e7860000>,
src=0x3ffffec0e00000 <error: Cannot access memory at address 0x3ffffec0e00000>, len=18014393758908416,
COPY=-1059061760)
at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.16/src/zlib-ng/arch/riscv/adler32_rvv.c:15
#1 0x0000002aab92dfd0 in adler32_rvv (adler=1,
buf=0x2aac471680 "tree dd8e4f518faadfc897932819f30f3b8f7c7f2aab\nparent 5ef4d5de3733648f5376a6f53fad378847eead53\nparent 67536a08e50b5a37625e5c5521a8cae7ccbd0d81\nauthor Sebastian Thiel <sebastian.thiel@icloud.com> 172690"...,
len=1210)
at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.16/src/zlib-ng/arch/riscv/adler32_rvv.c:129
#2 0x0000002aab924640 in inf_chksum (strm=0x2aac4974b0,
src=0x2aac471680 "tree dd8e4f518faadfc897932819f30f3b8f7c7f2aab\nparent 5ef4d5de3733648f5376a6f53fad378847eead53\nparent 67536a08e50b5a37625e5c5521a8cae7ccbd0d81\nauthor Sebastian Thiel <sebastian.thiel@icloud.com> 172690"...,
len=1210)
at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.16/src/zlib-ng/inflate.c:47
#3 0x0000002aab927592 in zng_inflate (strm=0x2aac4974b0, flush=0)
at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.16/src/zlib-ng/inflate.c:1055
#4 0x0000002aab91b0c6 in flate2::ffi::c::{impl#10}::decompress (self=0x3fffff1020, input=..., output=...,
flush=flate2::mem::FlushDecompress::None) at src/ffi/c.rs:252
#5 0x0000002aab91c0d8 in flate2::mem::Decompress::decompress (self=0x3fffff1020, input=..., output=...,
flush=flate2::mem::FlushDecompress::None) at src/mem.rs:452
#6 0x0000002aab910c54 in gix_features::zlib::Inflate::once (self=0x3fffff1020, input=..., out=...)
at gix-features/src/zlib/mod.rs:37
#7 0x0000002aab657f7e in gix_pack::data::File::decompress_entry_from_data_offset (self=0x2aac498b80, data_offset=14,
inflate=0x3fffff1020, out=...) at gix-pack/src/data/file/decode/entry.rs:129
#8 0x0000002aab657dd6 in gix_pack::data::File::decompress_entry (self=0x2aac498b80, entry=0x3ffffecdf0,
inflate=0x3fffff1020, out=...) at gix-pack/src/data/file/decode/entry.rs:99
--Type <RET> for more, q to quit, c to continue without paging--c
#9 0x0000002aab6581a0 in gix_pack::data::File::decode_entry (self=0x2aac498b80, entry=..., out=0x3ffffedeb0,
inflate=0x3fffff1020, resolve=..., delta_cache=...) at gix-pack/src/data/file/decode/entry.rs:177
#10 0x0000002aab2b23d8 in gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>::try_find_cached_inner<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>> (self=0x3fffff1018, id=...,
buffer=0x3ffffedeb0, inflate=0x3fffff1020, pack_cache=..., snapshot=0x3fffff1040, recursion=...)
at gix-odb/src/store_impls/dynamic/find.rs:151
#11 0x0000002aab2b166e in gix_odb::store_impls::dynamic::find::{impl#1}::try_find_cached<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>> (self=0x3fffff1018, id=..., buffer=0x3ffffedeb0, pack_cache=...)
at gix-odb/src/store_impls/dynamic/find.rs:356
#12 0x0000002aab28a2b2 in gix_odb::cache::impls::{impl#5}::try_find_cached<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>> (self=0x3fffff0fd8, id=..., buffer=0x3ffffedeb0, pack_cache=...)
at gix-odb/src/cache.rs:219
#13 0x0000002aab28a554 in gix_odb::cache::impls::{impl#5}::try_find<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>> (self=0x3fffff0fd8, id=..., buffer=0x3ffffedeb0) at gix-odb/src/cache.rs:203
#14 0x0000002aab28a58e in gix_odb::cache::impls::{impl#1}::try_find<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>> (self=0x3fffff0fd8, id=..., buffer=0x3ffffedeb0) at gix-odb/src/cache.rs:158
#15 0x0000002aab350e98 in gix_odb::memory::{impl#6}::try_find<gix_odb::Cache<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>>> (self=0x3fffff0fd8, id=..., buffer=0x3ffffedeb0)
at gix-odb/src/memory.rs:156
#16 0x0000002aab82c83e in gix_ref::store_impl::file::raw_ext::{impl#1}::peel_to_id_in_place_packed (self=0x3ffffee4d8,
store=<optimized out>, objects=..., packed=...) at gix-ref/src/store/file/raw_ext.rs:111
#17 0x0000002aab82c5d0 in gix_ref::store_impl::file::raw_ext::{impl#1}::peel_to_id_in_place (self=0x3ffffee4d8,
store=0x3fffff12f0, objects=...) at gix-ref/src/store/file/raw_ext.rs:92
#18 0x0000002aab372258 in gix::types::Reference::peel_to_id_in_place (self=0x3ffffee4d8) at gix/src/reference/mod.rs:73
#19 0x0000002aab36f78a in gix::types::Head::try_peel_to_id_in_place (self=0x3ffffeed28) at gix/src/head/peel.rs:131
#20 0x0000002aab36fe30 in gix::types::Head::peel_to_object_in_place (self=0x3ffffeed28) at gix/src/head/peel.rs:143
#21 0x0000002aab36ff56 in gix::types::Head::peel_to_commit_in_place (self=0x3ffffeed28) at gix/src/head/peel.rs:157
#22 0x0000002aab377cd4 in gix::types::Repository::head_commit (self=0x3fffff0fd8)
at gix/src/repository/reference.rs:205
#23 0x0000002aab378998 in gix::types::Repository::modules (self=0x3fffff0fd8) at gix/src/repository/submodule.rs:56
#24 0x0000002aab378f80 in gix::types::Repository::submodules (self=0x3fffff0fd8) at gix/src/repository/submodule.rs:78
#25 0x0000002aab303140 in gix::status::index_worktree::BuiltinSubmoduleStatus::new (repo=..., mode=...)
at gix/src/status/index_worktree.rs:199
#26 0x0000002aaacdc826 in gix::status::Platform<prodash::progress::utils::DoOrDiscard<prodash::tree::Item>>::into_index_worktree_iter<prodash::progress::utils::DoOrDiscard<prodash::tree::Item>, alloc::vec::Vec<bstr::bstring::BString, alloc::alloc::Global>> (self=..., patterns=...) at gix/src/status/index_worktree.rs:641
#27 0x0000002aaacd063e in gitoxide_core::repository::status::show<&mut dyn std::io::Write, &mut dyn std::io::Write, prodash::progress::utils::DoOrDiscard<prodash::tree::Item>> (repo=..., pathspecs=..., out=..., err=..., progress=...)
at gitoxide-core/src/repository/status.rs:70
#28 0x0000002aaac0fe64 in gitoxide::plumbing::main::main::{closure#6} (progress=..., out=..., err=...)
at src/plumbing/main.rs:242
#29 0x0000002aaad6581a in gitoxide::shared::pretty::prepare_and_run::{closure#0}<(), core::option::Option<core::ops::range::RangeInclusive<u8>>, gitoxide::plumbing::main::main::{closure_env#6}> () at src/shared.rs:170
#30 0x0000002aaace889a in gix_trace::enabled::Span::into_scope<core::result::Result<(), anyhow::Error>, gitoxide::shared::pretty::prepare_and_run::{closure_env#0}<(), core::option::Option<core::ops::range::RangeInclusive<u8>>, gitoxide::plumbing::main::main::{closure_env#6}>> (self=<error reading variable: Cannot access memory at address 0x0>,
f=<error reading variable: Cannot access memory at address 0x6c>) at gix-trace/src/lib.rs:43
#31 0x0000002aaad5240e in gitoxide::shared::pretty::prepare_and_run<(), core::option::Option<core::ops::range::RangeInclusive<u8>>, gitoxide::plumbing::main::main::{closure_env#6}> (name=..., trace=false, verbose=true, progress=false,
progress_keep_open=false, range=..., run=...) at src/shared.rs:169
#32 0x0000002aaac0a6f8 in gitoxide::plumbing::main::main () at src/plumbing/main.rs:233
#33 0x0000002aaac090a4 in gix::main () at src/gix.rs:5
Full backtrace with zlib-ng 2.2.2:
(gdb) run status
Starting program: /home/ubuntu/repos/gitoxide/target/debug/gix status
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Downloading separate debug info for system-supplied DSO at 0x3ff7fd8000
Downloading separate debug info for /lib/riscv64-linux-gnu/libssl.so.3
Downloading separate debug info for /lib/riscv64-linux-gnu/libcrypto.so.3
Downloading separate debug info for /lib/riscv64-linux-gnu/libz.so.1
Downloading separate debug info for /lib/riscv64-linux-gnu/libgcc_s.so.1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
[New Thread 0x3ff793df00 (LWP 15341)]
Thread 1 "gix" received signal SIGILL, Illegal instruction.
riscv_check_features (features=0x3fffff55b8)
at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/arch/riscv/riscv_features.c:61
61 __asm__ volatile(
(gdb) bt
#0 riscv_check_features (features=0x3fffff55b8)
at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/arch/riscv/riscv_features.c:61
#1 0x0000002aab92dd44 in cpu_check_features (features=0x3fffff55b8)
at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/cpu_features.c:21
#2 0x0000002aab924e16 in init_functable () at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/functable.c:49
#3 0x0000002aab9250da in force_init_stub () at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/functable.c:263
#4 0x0000002aab925ccc in zng_inflateInit2 (strm=0x2aac49d4b0, windowBits=15)
at /home/ubuntu/repos/libz-ng-sys/src/zlib-ng/inflate.c:220
#5 0x0000002aab91b2b0 in libz_ng_sys::inflateInit2_ (strm=0x2aac49d4b0, windowBits=15, _version=0x2aabd8ae34,
_stream_size=104) at /home/ubuntu/repos/libz-ng-sys/src/lib.rs:280
#6 flate2::ffi::c::c_backend::mz_inflateInit2 (stream=0x2aac49d4b0, window_bits=15) at src/ffi/c.rs:474
#7 0x0000002aab91bfb0 in flate2::ffi::c::{impl#10}::make (zlib_header=true, window_bits=15) at src/ffi/c.rs:215
#8 0x0000002aab91d0f6 in flate2::mem::Decompress::new (zlib_header=true) at src/mem.rs:367
#9 0x0000002aab9103a2 in gix_features::zlib::{impl#0}::default () at gix-features/src/zlib/mod.rs:27
#10 0x0000002aab61591e in gix_odb::Store::to_handle (self=0x3fffff6370)
at gix-odb/src/store_impls/dynamic/handle.rs:259
#11 0x0000002aab34f5ac in gix::repository::impls::{impl#4}::from (repo=...) at gix/src/repository/impls.rs:59
#12 0x0000002aaace944c in core::ops::function::FnOnce::call_once<fn(gix::types::ThreadSafeRepository) -> gix::types::Repository, (gix::types::ThreadSafeRepository)> ()
at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/ops/function.rs:250
#13 0x0000002aaac5c2fe in core::result::Result<gix::types::ThreadSafeRepository, gix::discover::Error>::map<gix::types::ThreadSafeRepository, gix::discover::Error, gix::types::Repository, fn(gix::types::ThreadSafeRepository) -> gix::types::Repository> (self=..., op=0x2aaad1890001)
at /rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/result.rs:771
#14 0x0000002aaad18900 in gitoxide::plumbing::main::main::{closure#0} (
mode=gitoxide::plumbing::main::main::Mode::Lenient) at src/plumbing/main.rs:108
#15 0x0000002aaad19532 in gitoxide::plumbing::main::main::{closure#6} (progress=..., out=..., err=...)
at src/plumbing/main.rs:243
#16 0x0000002aaac41cc2 in gitoxide::shared::pretty::prepare_and_run::{closure#0}<(), core::option::Option<core::ops::ran--Type <RET> for more, q to quit, c to continue without paging--c
ge::RangeInclusive<u8>>, gitoxide::plumbing::main::main::{closure_env#6}> () at src/shared.rs:170
#17 0x0000002aaacb7ac8 in gix_trace::enabled::Span::into_scope<core::result::Result<(), anyhow::Error>, gitoxide::shared::pretty::prepare_and_run::{closure_env#0}<(), core::option::Option<core::ops::range::RangeInclusive<u8>>, gitoxide::plumbing::main::main::{closure_env#6}>> (self=..., f=...) at gix-trace/src/lib.rs:43
#18 0x0000002aaac0f742 in gitoxide::shared::pretty::prepare_and_run<(), core::option::Option<core::ops::range::RangeInclusive<u8>>, gitoxide::plumbing::main::main::{closure_env#6}> (name=..., trace=false, verbose=true, progress=false,
progress_keep_open=false, range=..., run=...) at src/shared.rs:169
#19 0x0000002aaac0a898 in gitoxide::plumbing::main::main () at src/plumbing/main.rs:233
#20 0x0000002aaac09262 in gix::main () at src/gix.rs:5
I've posted https://github.com/zlib-ng/zlib-ng/issues/1705#issuecomment-2366412936 to inform the upstream project of the SIGILL during auto-detection (which can also be observed when running its test suite on the test machine I have been using).
Note that, while I wrote the description in #218 in such a way as to cause this to be closed as completed when it was merged, that is really a workaround. The problem described here is really the upstream bug https://github.com/zlib-ng/zlib-ng/issues/1705, which remains open.
This strongly resembles #148 (https://github.com/Byron/gitoxide/issues/955) but affects riscv64 rather than x86_64, seems to happen every time rather than sporadically, and may be a case of a known zlib-ng bug, https://github.com/zlib-ng/zlib-ng/issues/1705.
What happens
Both
dev
andrelease
builds are affected. On a 64-bit RISC-V machine running Ubuntu 24.04 LTS, runninggix clone
begins the download but is always terminated withSIGILL
(as indicated by the message text and as can be inferred from the exit status):Passing
--trace
does not help, since it doesn't get far enough to print the traced output, and since the program is immediately being terminated by the system, rather than hitting a panic and being able to unwind. The prompt printed afterwards is interleaved with the displayed progress due to the sudden way the program was terminated. (The displayed exit status, here 132, is part of the prompt.)Getting a backtrace using
gdb
Running it in a debugger and printing a backtrace after the crash provides far more information:
This shows that the problem happens in
zlib-ng
and more specifically inlibz-ng-sys-1.1.15/src/zlib-ng/arch/riscv/adler32_rvv.c
, which through thezlib-ng
repository resolves through the git submodule to this file.Likely explanation
This looks a lot like a case of https://github.com/zlib-ng/zlib-ng/issues/1705. My kernel is old enough to trigger that bug:
Furthermore, that issue mentions the problem happening in starship, and I wonder if starship might even be triggering the crash in the same way through one of its
gix-*
dependencies.Perhaps this issue is a victim of its own success and should be closed, to become (and be referenced by) a new comment on https://github.com/zlib-ng/zlib-ng/issues/1705. But I figured I'd start by opening this, in case more is known or can be discerned here, in case it might somehow help with #148, and in case there's anything to be done about it anywhere else (such as gitoxide).
Simplified reproduction
To check if the problem is occurring, and to check that it happens in
ein
as well asgix
, thegix status
andein t h
commands can be used: