rust-lang / libz-sys

Rust crate package to link to a system libz (zlib)
Apache License 2.0
116 stars 75 forks source link

zlib-ng: Illegal instruction on riscv64gc-unknown-linux-gnu #200

Open EliahKagan opened 1 month ago

EliahKagan commented 1 month ago

This strongly resembles #148 (https://github.com/Byron/gitoxide/issues/955) but affects riscv64 rather than x86_64, seems to happen every time rather than sporadically, and may be a case of a known zlib-ng bug, https://github.com/zlib-ng/zlib-ng/issues/1705.

What happens

Both dev and release builds are affected. On a 64-bit RISC-V machine running Ubuntu 24.04 LTS, running gix clone begins the download but is always terminated with SIGILL (as indicated by the message text and as can be inferred from the exit status):

ubuntu@riscv:~/src$ gix clone git@github.com:Byron/gitoxide.git
Illegal instruction (core dumped)              1 steps [===   ===   ===   ===   ===   ===   ===   ===   ===   ===   ===]
ubuntu@riscv:~/src[132]$ ects 39.1K/48.3K objects [81%] [=================================================>------------]
ubuntu@riscv:~/src[132]$                    0B  |0B/s| [===   ===   ===   ===   ===   ===   ===   ===   ===   ===   ===]

Passing --trace does not help, since it doesn't get far enough to print the traced output, and since the program is immediately being terminated by the system, rather than hitting a panic and being able to unwind. The prompt printed afterwards is interleaved with the displayed progress due to the sudden way the program was terminated. (The displayed exit status, here 132, is part of the prompt.)

Getting a backtrace using gdb

Running it in a debugger and printing a backtrace after the crash provides far more information:

ubuntu@riscv:~/repos/gitoxide (main =)[1]$ file target/debug/gix
target/debug/gix: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=d4b06d7d749d472b1a023c6526c5102b1c885ce8, for GNU/Linux 4.15.0, with debug_info, not stripped
ubuntu@riscv:~/repos/gitoxide (main =)$ gdb target/debug/gix
GNU gdb (Ubuntu 15.0.50.20240403-0ubuntu1) 15.0.50.20240403-git
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "riscv64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from target/debug/gix...
warning: Missing auto-load script at offset 0 in section .debug_gdb_scripts
of file /home/ubuntu/repos/gitoxide/target/debug/gix.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) run --trace clone git@github.com:EliahKagan/gitoxide.git
Starting program: /home/ubuntu/repos/gitoxide/target/debug/gix --trace clone git@github.com:EliahKagan/gitoxide.git
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
[New Thread 0x3ff78a9f20 (LWP 32276)]
[Detaching after vfork from child process 32278]
[New Thread 0x3ff76a5f20 (LWP 32277)]
[New Thread 0x3ff749bf20 (LWP 32279)]
[Detaching after vfork from child process 32280]
[New Thread 0x3ff7294f20 (LWP 32281)]
[Thread 0x3ff76a5f20 (LWP 32277) exited]
 receiving pack                               1 steps [ ===   ===   ===   ===   ===   ===   ===   ===   ===   ===   ===]
Thread 1 "gix" received signal SIGILL, Illegal instruction.=======================>------------------------------------]
0x0000002aab8abd18 in adler32_rvv_impl (adler=42, /s| [ ===   ===   ===   ===   ===   ===   ===   ===   ===   ===   ===]
    dst=0x500000008 <error: Cannot access memory at address 0x500000008>,
    src=0x2aab8ade4e <load_64_bits+54> "\2037\004\376\003'D\375\263\227", <incomplete sequence \347>,
    len=274877754864, COPY=-1405147728)
    at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.15/src/zlib-ng/arch/riscv/adler32_rvv.c:15
15      static inline uint32_t adler32_rvv_impl(uint32_t adler, uint8_t* restrict dst, const uint8_t *src, size_t len, int COPY) {
(gdb)
(gdb)
(gdb) bt
#0  0x0000002aab8abd18 in adler32_rvv_impl (adler=42,
    dst=0x500000008 <error: Cannot access memory at address 0x500000008>,
    src=0x2aab8ade4e <load_64_bits+54> "\2037\004\376\003'D\375\263\227", <incomplete sequence \347>,
    len=274877754864, COPY=-1405147728)
    at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.15/src/zlib-ng/arch/riscv/adler32_rvv.c:15
#1  0x0000002aab8ad038 in adler32_rvv (adler=1, buf=0x3ffffdb4e7 "\n", len=1)
    at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.15/src/zlib-ng/arch/riscv/adler32_rvv.c:129
#2  0x0000002aab8a36a8 in inf_chksum (strm=0x2aac41b000, src=0x3ffffdb4e7 "\n", len=1)
    at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.15/src/zlib-ng/inflate.c:47
#3  0x0000002aab8a65fa in zng_inflate (strm=0x2aac41b000, flush=0)
    at /home/ubuntu/.cargo/registry/src/index.crates.io-6f17d22bba15001f/libz-ng-sys-1.1.15/src/zlib-ng/inflate.c:1055
#4  0x0000002aab899052 in flate2::ffi::c::{impl#10}::decompress (self=0x2aac3f4b18, input=..., output=...,
    flush=flate2::mem::FlushDecompress::None) at src/ffi/c.rs:252
#5  0x0000002aab89b124 in flate2::mem::Decompress::decompress (self=0x2aac3f4b18, input=..., output=...,
    flush=flate2::mem::FlushDecompress::None) at src/mem.rs:452
#6  0x0000002aab1f7120 in gix_features::zlib::stream::inflate::read<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>> (rd=0x3ffffdd9b0, state=0x2aac3f4b18, dst=...) at gix-features/src/zlib/stream/inflate.rs:20
#7  0x0000002aab2ca7e0 in gix_pack::data::input::bytes_to_entries::{impl#7}::read<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>> (self=0x3ffffdd9b0, into=...) at gix-pack/src/data/input/bytes_to_entries.rs:299
#8  0x0000002aab216c2a in std::io::Read::read_buf::{closure#0}<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>> (b=...)
--Type <RET> for more, q to quit, c to continue without paging--c
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/io/mod.rs:973
#9  0x0000002aab210806 in std::io::default_read_buf<std::io::Read::read_buf::{closure_env#0}<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>>> (read=...,
    cursor=...) at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/io/mod.rs:574
#10 0x0000002aab2c833e in std::io::Read::read_buf<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>> (self=0x3ffffdd9b0, buf=...)
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/io/mod.rs:973
#11 0x0000002aab2b61b6 in std::io::copy::stack_buffer_copy<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>, std::io::util::Sink> (reader=0x3ffffdd9b0,
    writer=0x3ffffdda27) at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/io/copy.rs:278
#12 0x0000002aab3ee0b4 in std::io::copy::{impl#4}::copy_from<std::io::util::Sink, gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>> (self=0x3ffffdda27,
    reader=0x3ffffdd9b0) at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/io/copy.rs:202
#13 0x0000002aab2b51d2 in std::io::copy::generic_copy<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>, std::io::util::Sink> (reader=0x3ffffdd9b0,
    writer=0x3ffffdda27) at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/io/copy.rs:89
#14 0x0000002aab359dee in std::sys::pal::unix::kernel_copy::{impl#1}::copy<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>, std::io::util::Sink> (self=...)
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/sys/pal/unix/kernel_copy.rs:182
#15 0x0000002aab2b6418 in std::sys::pal::unix::kernel_copy::copy_spec<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>, std::io::util::Sink> (
    read=<optimized out>, write=<optimized out>)
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/sys/pal/unix/kernel_copy.rs:76
#16 std::io::copy::copy<gix_pack::data::input::bytes_to_entries::DecompressRead<gix_pack::data::input::bytes_to_entries::PassThrough<&mut std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>, alloc::vec::Vec<u8, alloc::alloc::Global>>>, std::io::util::Sink> (reader=0x3ffffdd9b0, writer=0x3ffffdda27)
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/std/src/io/copy.rs:68
#17 0x0000002aab2c9258 in gix_pack::data::input::bytes_to_entries::BytesToEntriesIter<std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>>::next_inner<std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>> (self=0x2aac3f4a30) at gix-pack/src/data/input/bytes_to_entries.rs:117
#18 0x0000002aab2c7f6c in gix_pack::data::input::bytes_to_entries::{impl#2}::next<std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>> (self=0x2aac3f4a30)
    at gix-pack/src/data/input/bytes_to_entries.rs:215
#19 0x0000002aab3ba938 in gix_pack::data::input::lookup_ref_delta_objects::{impl#1}::next<gix_pack::data::input::bytes_to_entries::BytesToEntriesIter<std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>>, alloc::boxed::Box<gix_odb::Cache<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>>, alloc::alloc::Global>> (self=0x2aac3f4a30) at gix-pack/src/data/input/lookup_ref_delta_objects.rs:90
#20 0x0000002aab3af36e in core::iter::adapters::peekable::{impl#1}::next<gix_pack::data::input::lookup_ref_delta_objects::LookupRefDeltaObjectsIter<gix_pack::data::input::bytes_to_entries::BytesToEntriesIter<std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>>, alloc::boxed::Box<gix_odb::Cache<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>>, alloc::alloc::Global>>> (self=0x2aac3f4a30)
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/iter/adapters/peekable.rs:40
#21 0x0000002aab3edca2 in gix_pack::data::input::entries_to_bytes::{impl#1}::next<gix_pack::data::input::lookup_ref_delta_objects::LookupRefDeltaObjectsIter<gix_pack::data::input::bytes_to_entries::BytesToEntriesIter<std::io::buffered::bufreader::BufReader<gix_features::interrupt::Read<gix_features::progress::Read<&mut dyn std::io::BufRead, prodash::progress::utils::ThroughputOnDrop<prodash::traits::BoxedDynNestedProgress>>>>>, alloc::boxed::Box<gix_odb::Cache<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>>, alloc::alloc::Global>>, gix_pack::bundle::write::types::LockWriter> (self=0x2aac3f4a30) at gix-pack/src/data/input/entries_to_bytes.rs:130
#22 0x0000002aab23ea00 in alloc::boxed::{impl#42}::next<dyn core::iter::traits::iterator::Iterator<Item=core::result::Result<gix_pack::data::input::Entry, gix_pack::data::input::types::Error>>, alloc::alloc::Global> (self=0x3ffffe0118)
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/alloc/src/boxed.rs:1997
#23 0x0000002aab20d706 in core::iter::traits::iterator::{impl#0}::next<dyn core::iter::traits::iterator::Iterator<Item=core::result::Result<gix_pack::data::input::Entry, gix_pack::data::input::types::Error>>> (self=0x3ffffdf498)
    at /rustc/051478957371ee0084a7c0913941d2a8c4757bb9/library/core/src/iter/traits/iterator.rs:4109
#24 0x0000002aab3cb3f8 in gix_pack::index::File::write_data_iter_to_stream<gix_pack::bundle::write::{impl#1}::inner_write::{closure_env#0}<&std::path::Path>, fn(core::ops::range::Range<u64>, &memmap2::Mmap) -> core::option::Option<&[u8]>, memmap2::Mmap> (version=gix_pack::index::Version::V2, make_resolver=..., entries=..., thread_limit=...,
    root_progress=..., out=..., should_interrupt=0x2aac3b6e10 <gix::interrupt::IS_INTERRUPTED>,
    pack_version=gix_pack::data::Version::V2) at gix-pack/src/index/write/mod.rs:119
#25 0x0000002aab398a4e in gix_pack::Bundle::inner_write<&std::path::Path> (directory=..., progress=..., data_file=...,
    pack_entries_iter=..., should_interrupt=0x2aac3b6e10 <gix::interrupt::IS_INTERRUPTED>,
    pack_version=gix_pack::data::Version::V2) at gix-pack/src/bundle/write/mod.rs:287
#26 0x0000002aab399a20 in gix_pack::Bundle::write_to_directory<alloc::boxed::Box<gix_odb::Cache<gix_odb::store_impls::dynamic::Handle<alloc::sync::Arc<gix_odb::Store, alloc::alloc::Global>>>, alloc::alloc::Global>> (pack=...,
    directory=..., progress=..., should_interrupt=0x2aac3b6e10 <gix::interrupt::IS_INTERRUPTED>,
    thin_pack_base_object_lookup=..., options=...) at gix-pack/src/bundle/write/mod.rs:144
#27 0x0000002aab1ea34e in gix::remote::connection::fetch::Prepare<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>>::receive_inner<alloc::boxed::Box<(dyn gix_transport::client::blocking_io::traits::Transport + core::marker::Send), alloc::alloc::Global>> (self=..., progress=...,
    should_interrupt=0x2aac3b6e10 <gix::interrupt::IS_INTERRUPTED>)
    at gix/src/remote/connection/fetch/receive_pack.rs:282
#28 0x0000002aab1ff04a in gix::clone::PrepareFetch::fetch_only_inner (self=0x3fffff42f8, progress=...,
    should_interrupt=0x2aac3b6e10 <gix::interrupt::IS_INTERRUPTED>) at gix/src/clone/fetch/mod.rs:199
#29 0x0000002aaad32568 in gix::clone::PrepareFetch::fetch_only<&mut prodash::progress::utils::DoOrDiscard<prodash::tree::Item>> (self=0x3fffff42f8, progress=0x3fffffa610, should_interrupt=0x2aac3b6e10 <gix::interrupt::IS_INTERRUPTED>)
    at gix/src/clone/fetch/mod.rs:75
#30 0x0000002aaad325ca in gix::clone::PrepareFetch::fetch_then_checkout<&mut prodash::progress::utils::DoOrDiscard<prodash::tree::Item>> (self=0x3fffff42f8, progress=0x3fffffa610,
    should_interrupt=0x2aac3b6e10 <gix::interrupt::IS_INTERRUPTED>) at gix/src/clone/fetch/mod.rs:231
#31 0x0000002aaae9269a in gitoxide_core::repository::clone::function::clone<prodash::progress::utils::DoOrDiscard<prodash::tree::Item>, std::ffi::os_str::OsString, std::path::PathBuf, &mut dyn std::io::Write, &mut dyn std::io::Write> (
    url=..., directory=..., overrides=..., progress=..., out=..., err=...) at gitoxide-core/src/repository/clone.rs:78
#32 0x0000002aaae36de6 in gitoxide::plumbing::main::main::{closure#13} (
    progress=<error reading variable: Cannot access memory at address 0x1>, out=..., err=...)
    at src/plumbing/main.rs:433
#33 0x0000002aaac43c3a in gitoxide::shared::pretty::prepare_and_run::{closure#0}<(), core::ops::range::RangeInclusive<u8>, gitoxide::plumbing::main::main::{closure_env#13}> () at src/shared.rs:170
#34 0x0000002aaac5e59c in gix_trace::enabled::Span::into_scope<core::result::Result<(), anyhow::Error>, gitoxide::shared::pretty::prepare_and_run::{closure_env#0}<(), core::ops::range::RangeInclusive<u8>, gitoxide::plumbing::main::main::{closure_env#13}>> (self=<error reading variable: Cannot access memory at address 0x0>,
    f=<error reading variable: Cannot access memory at address 0x31>) at gix-trace/src/lib.rs:43
#35 0x0000002aaac23680 in gitoxide::shared::pretty::prepare_and_run<(), core::ops::range::RangeInclusive<u8>, gitoxide::plumbing::main::main::{closure_env#13}> (name=..., trace=true, verbose=true, progress=false, progress_keep_open=false,
    range=..., run=...) at src/shared.rs:169
#36 0x0000002aaac0c38e in gitoxide::plumbing::main::main () at src/plumbing/main.rs:426
#37 0x0000002aaac0966c in gix::main () at src/gix.rs:5
(gdb)

This shows that the problem happens in zlib-ng and more specifically in libz-ng-sys-1.1.15/src/zlib-ng/arch/riscv/adler32_rvv.c, which through the zlib-ng repository resolves through the git submodule to this file.

Likely explanation

This looks a lot like a case of https://github.com/zlib-ng/zlib-ng/issues/1705. My kernel is old enough to trigger that bug:

ubuntu@riscv:~$ uname -a
Linux riscv 5.10.113-scw1 #1 SMP PREEMPT Fri Jul 12 15:31:22 UTC 2024 riscv64 riscv64 riscv64 GNU/Linux

Furthermore, that issue mentions the problem happening in starship, and I wonder if starship might even be triggering the crash in the same way through one of its gix-* dependencies.

Perhaps this issue is a victim of its own success and should be closed, to become (and be referenced by) a new comment on https://github.com/zlib-ng/zlib-ng/issues/1705. But I figured I'd start by opening this, in case more is known or can be discerned here, in case it might somehow help with #148, and in case there's anything to be done about it anywhere else (such as gitoxide).

Simplified reproduction

To check if the problem is occurring, and to check that it happens in ein as well as gix, the gix status and ein t h commands can be used:

ubuntu@riscv:~/repos/gitoxide (main =)$ cargo run --bin=gix -- status
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.44s
     Running `target/debug/gix status`
Illegal instruction (core dumped)
ubuntu@riscv:~/repos/gitoxide (main =)[132]$ cargo run --bin=gix -r -- status
    Finished `release` profile [optimized] target(s) in 1.39s
     Running `target/release/gix status`
Illegal instruction (core dumped)
ubuntu@riscv:~/repos/gitoxide (main =)[132]$ cargo run --bin=ein -- t h
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.37s
     Running `target/debug/ein t h`
Illegal instruction (core dumped)
ubuntu@riscv:~/repos/gitoxide (main =)[132]$ cargo run --bin=ein -r -- t h
    Finished `release` profile [optimized] target(s) in 1.37s
     Running `target/release/ein t h`
Illegal instruction (core dumped)
Byron commented 1 month ago

Thanks for the detailed report and for providing enough details to reproduce the issue.

To me it seems that libz-ng might be the place to actually get a solution for the issue if it gets prioritized and is indeed the same. This crate probably doesn't have the means to affect it, except for changing the version of the libz-ng source code.

Lastly, gitoxide probably has a fix already by using a different zlib implementation, at least on the platforms that are likely to fail, if known in advance.

EliahKagan commented 1 month ago

To me it seems that libz-ng might be the place to actually get a solution for the issue if it gets prioritized and is indeed the same.

Yes. Assuming this is https://github.com/zlib-ng/zlib-ng/issues/1705, which I believe to be the case, there is a patch that has been reported to work, though as far as I know it has not yet been offered as a PR. This is as detailed in https://github.com/zlib-ng/zlib-ng/issues/1705#issuecomment-2177455106 and preceding comments.

This crate probably doesn't have the means to affect it, except for changing the version of the libz-ng source code.

When building this crate, is there a way to pass configuration variables to cmake for libz-ng, as if by passing -D ... to cmake? As noted in https://github.com/zlib-ng/zlib-ng/issues/1705#issuecomment-2177453721, -D WITH_RVV=OFF can be passed when cross-compiling zlib-ng for RISC-V to prevent vector instructions requiring RVV support from being emitted, and this should be effective as a workaround for incorrect detection when not cross-compiling.

This would provide a way to make this crate work while neither changing the source code of libz-ng or using a different zlib implementation. It would also allow me to confirm to an even greater degree of certainty that this issue really is a straightforward case of https://github.com/zlib-ng/zlib-ng/issues/1705 and nothing more.

If this is not feasible, or if it is feasible but not easy, then should a feature be added here for it, or something? (But maybe this really is quite easy and I am just not aware of it.)

Lastly, gitoxide probably has a fix already by using a different zlib implementation, at least on the platforms that are likely to fail, if known in advance.

Achieving the effect of -D WITH_RVV=OFF, if there is way to do it when cmake is run during a build of this crate, should likewise allow gitoxide to work. Furthermore, even once https://github.com/zlib-ng/zlib-ng/issues/1705, this would be helpful for actual cross compilation, such as if we make binary releases for RISC-V.

Byron commented 1 month ago

When building this crate, is there a way to pass configuration variables to cmake for libz-ng, as if by passing -D ... to cmake? As noted in zlib-ng/zlib-ng#1705 (comment), -D WITH_RVV=OFF can be passed when cross-compiling zlib-ng for RISC-V to prevent vector instructions requiring RVV support from being emitted, and this should be effective as a workaround for incorrect detection when not cross-compiling.

A great idea, this is absolutely possible!

If this is not feasible, or if it is feasible but not easy, then should a feature be added here for it, or something? (But maybe this really is quite easy and I am just not aware of it.)

The build.rs script for zlib-ng should allow to detect that case and conditionally pass a flag to the cmake invocation. It should be easy enough if there is the right test-system available.

Achieving the effect of -D WITH_RVV=OFF, if there is way to do it when cmake is run during a build of this crate, should likewise allow gitoxide to work. Furthermore, even once zlib-ng/zlib-ng#1705, this would be helpful for actual cross compilation, such as if we make binary releases for RISC-V.

Now I have hopes that this can be fixed here, and if you say it will remain useful even if zlib-ng ships their fix, that's even better.

Thanks again for all your great work!

EliahKagan commented 1 month ago

The build.rs script for zlib-ng should allow to detect that case and conditionally pass a flag to the cmake invocation. It should be easy enough if there is the right test-system available.

When compiling binaries to be distributed or otherwise to be run on another system, one may want to insist that RVV instructions be emitted, or insist that they not be emitted. Especially since an upstream fix should eventually take care of the auto-detection case, I think it would be best for a change here to allow it to be manually overridden.

This would not necessarily preclude implementing auto-detection here as well. But the appraisal of "easy enough" might be an underestimate, considering that the difficulty and subtleties involved in checking this are the cause of the upstream bug. (Detection is attempted, but the result is not always correct.) Furthermore, manually overriding it is really the capability that would remain useful here even after the upstream bug is fixed.

Since arbitrary logic, including logic specific to building particular architectures, could go in builds.rs or modules in zng that it uses, the manual override could be by an environment variable or other external mechanism. But I wonder if doing it by feature would still be better.

Is it acceptable if I add rvv-off and rvv-on features, or something like that?

(Auto-detection done here could then be by an rvv-auto feature, if later added. This would remain distinct from when no rvv-* feature is enabled, which would use the upstream detection.)

Byron commented 1 month ago

Thanks for the update.

Is it acceptable if I add rvv-off and rvv-on features, or something like that?

Yes, I'd also do it with a cargo feature, which would also have to be additive. Of course it's possible to add logic to allow --all-features builds that don't clash, but ideally this could be so minimal that it fixes the most anticipated usecase.

EliahKagan commented 1 month ago

I've noticed that this project introduces the nonstandard configuration name zng, for which warnings are sometimes issued. (To be clear, that doesn't show an effective test run, it's just an example of the warnings. The warnings can also be seen on CI.) That is not specific to RISC-V.

Why was this done for that, rather than using a feature? Does it mean I should consider introducing other such nonstandard configuration names for overriding RVV detection, rather than using features? That seems like a wrong thing to do, but since I am not clear on why it was done for zng, I am not certain. Or should it not have been done for zng either, and should that be changed?

Of course it's possible to add logic to allow --all-features builds that don't clash, but ideally this could be so minimal that it fixes the most anticipated usecase.

Do you mean that the added features should be minimal and thus not try to support --all-features on RISC-V, or that they should try to support --all-features even on RISC-V where it would feel contradictory if the features are rvv-off and rvv-on but that this should be done in a minimal way?

I think that, outside of architectures where it makes a difference, the new features could be no-ops and supplying them together could be permitted, while prohibiting it on RISC-V. Whether that's the best way to do it, I am not sure.

Byron commented 1 month ago

Why was this done for that, rather than using a feature? Does it mean I should consider introducing other such nonstandard configuration names for overriding RVV detection, rather than using features? That seems like a wrong thing to do, but since I am not clear on why it was done for zng, I am not certain. Or should it not have been done for zng either, and should that be changed?

I'd hope Git has information on this, as I myself joined late enough to not know anything on how things came to be, unfortunately.

Do you mean that the added features should be minimal and thus not try to support --all-features on RISC-V, or that they should try to support --all-features even on RISC-V where it would feel contradictory if the features are rvv-off and rvv-on but that this should be done in a minimal way?

All cargo-features should be additive so that --all-features will work and have predictable results. In practice, that's not always possible, but by using just a single RVV-related flag it should naturally be additive. I also think it would do nothing outside of its applicable platform. Also, I'd approach this as a band-aid to fix one specific problem, and not try to exhaustively solve every conceivable use-case, to keep it simple.

EliahKagan commented 1 month ago

I'd hope Git has information on this, as I myself joined late enough to not know anything on how things came to be, unfortunately.

I'll look into it and let you know what I find.

[...] Also, I'd approach this as a band-aid to fix one specific problem, and not try to exhaustively solve every conceivable use-case, to keep it simple.

I'll make just a feature to force RVV to be turned off, since that's the specific problem right now.

EliahKagan commented 4 days ago

I've opened #218 to add the rvv-off feature discussed above, though further changes may be needed, for the reasons presented there.