rustwasm / wasm-pack

πŸ“¦βœ¨ your favorite rust -> wasm workflow tool!
https://rustwasm.github.io/wasm-pack/
Apache License 2.0
6.32k stars 409 forks source link

Segmentation fault after small patch #650

Closed richard-uk1 closed 1 year ago

richard-uk1 commented 5 years ago

πŸ› Bug description

I'm getting a segfault when I run wasm-pack with no arguments. I've made some small non-unsafe changes to the codebase.

Sometimes, instead of a segfault, I get a failed assertion in curl-sys:

thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `0`', **/home/.cargo/registry/src/github.com-1ecc6299db9ec823/curl-0.4.20/src/lib.rs:92:13**

which is here: curl assert fail.

Here is the stack trace from the segfault core dump

May 19 15:44:30 systemd-coredump[11016]: Process 11013 (wasm-pack) of user 1000 dumped core.

Stack trace of thread 11014:
#0  0x00007fb5df46836d __pthread_rwlock_unlock (libpthread.so.0)
#1  0x00007fb5df17e3ba CRYPTO_THREAD_unlock (libcrypto.so.1.1)
#2  0x00007fb5df0f1e07 n/a (libcrypto.so.1.1)
#3  0x00007fb5df0f21c8 ERR_load_strings_const (libcrypto.so.1.1)
#4  0x00007fb5df679aea ERR_load_SSL_strings (libssl.so.1.1)
#5  0x00007fb5df679b1a n/a (libssl.so.1.1)
#6  0x00007fb5df46a5cf __pthread_once_slow (libpthread.so.0)
#7  0x00007fb5df17e40a CRYPTO_THREAD_run_once (libcrypto.so.1.1)
#8  0x00007fb5df679f24 OPENSSL_init_ssl (libssl.so.1.1)
#9  0x0000560063ae2fff n/a (/home/devel/non-work/wasm-pack/target/debug/wasm-pack

πŸ‘Ÿ Steps to reproduce

  1. Clone the repo above
  2. run wasm-pack (e.g. cargo run)

🌍 Your environment

wasm-pack version: latest git rustc version: 1.34.2

richard-uk1 commented 5 years ago

Update: I've updated my system, rebooted my computer and done a full rebuild. The segfault has changed slightly: now I occasionally get a message like:

double free or corruption (out)

and the stacktrace has changed

systemd-coredump[7725]: Process 7721 (wasm-pack) of user 1000 dumped core.

Stack trace of thread 7723:
#0  0x00007fbfae74e36d __pthread_rwlock_unlock (libpthread.so.0)
#1  0x00007fbfae3d23ba CRYPTO_THREAD_unlock (libcrypto.so.1.1)
#2  0x00007fbfae345e07 n/a (libcrypto.so.1.1)
#3  0x00007fbfae3461c8 ERR_load_strings_const (libcrypto.so.1.1)
#4  0x00007fbfae2a3a8d ERR_load_BN_strings (libcrypto.so.1.1)
#5  0x00007fbfae3472de n/a (libcrypto.so.1.1)
#6  0x00007fbfae3679ca n/a (libcrypto.so.1.1)
#7  0x00007fbfae7505cf __pthread_once_slow (libpthread.so.0)
#8  0x00007fbfae3d240a CRYPTO_THREAD_run_once (libcrypto.so.1.1)
#9  0x00007fbfae367ec6 OPENSSL_init_crypto (libcrypto.so.1.1)
#10 0x00007fbfae346792 ERR_get_state (libcrypto.so.1.1)
#11 0x00007fbfae3468da ERR_clear_error (libcrypto.so.1.1)
#12 0x00007fbfae3678da n/a (libcrypto.so.1.1)
#13 0x00007fbfae7505cf __pthread_once_slow (libpthread.so.0)
#14 0x00007fbfae3d240a CRYPTO_THREAD_run_once (libcrypto.so.1.1)
#15 0x00007fbfae3680a3 OPENSSL_init_crypto (libcrypto.so.1.1)
#16 0x00007fbfae993d72 n/a (libcurl.so.4)
#17 0x00007fbfae958ffe n/a (libcurl.so.4)
#18 0x000055ee7dda8f8a n/a (/home/devel/non-work/wasm-pack/target/release/wasm-pack
drager commented 5 years ago

Hi @derekdreery! Thanks for the bug report! What OS do you have? I have no problem running wasm-pack without any arguments on a Linux x64 system. I tried the latest master and built it with cargo build. Then running it like this: ./target/debug/wasm-pack.

You might have some problem with curl on your OS.

Maybe @alexcrichton has some ideas?

alexcrichton commented 5 years ago

Yeah this unfortunately looks like an issue with the build process or the built binary, probably unrelated to wasm-pack itself. The prebuilt binaries should likely work for you though.

richard-uk1 commented 5 years ago

The interesting thing is that the program works fine without the changes I made. So it seems that the changes introduce the error. I've just done a sys upgrade so I will re-test to see if the problem has gone.

ashleygwilliams commented 5 years ago

@derekdreery are you still running into this? let us know!

richard-uk1 commented 5 years ago

Hey, I'll have another go! :)

richard-uk1 commented 5 years ago

I've still got the issue, but I'm also seeing it with the master branch, so it's unrelated to my patch. It's probably a version mismatch somewhere between a library and its bindings.

richard-uk1 commented 5 years ago

Just to follow up, it's fine to close this issue. I'd still personally like to get to the bottom of what's happening, but I lack the skills

marienz commented 5 years ago

I'm seeing what's probably the same segfault. It looks like a race condition within openssl triggered by exiting (running atexit handlers) while the thread spawned by background_check_for_updates is initializing openssl.

Even if openssl handled shutdown mid-initialization, I suspect you'd still be in trouble if the version check thread gets as far as actually using openssl while the main thread exits the process. It'd be nice if wasm-pack could wait for that other thread before exiting, but unless it can interrupt the check in flight I'm guessing that'd be a bit slow... Haven't investigated what it would take to interrupt the check and exit cleanly.

I'm also seeing this with a locally built wasm-pack (from revision 30a95a42f14abb5b7d6e50f98256da7400046199, built with Rust 1.35.0, with a local change that shouldn't matter if we exit this early). I haven't investigated if official builds of wasm-pack are less crashy (or why that might be).

653 (and any other changes that rate-limit the version check) might mostly hide this issue.

The exact stack of the crashing thread varies, but usually looks like:

Thread 1 (Thread 0x7f9ca167f700 (LWP 17184)):
#0  __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x0) at pthread_rwlock_common.c:581
#1  __GI___pthread_rwlock_wrlock (rwlock=0x0) at pthread_rwlock_wrlock.c:27
#2  0x00007f6461bb6109 in CRYPTO_THREAD_write_lock (lock=<optimized out>) at crypto/threads_pthread.c:66
#3  0x00007f9ca196c245 in OBJ_NAME_add (name=0x7f9ca19e0500 "md5", type=type@entry=1, data=data@entry=0x7f9ca1a5ef20 <md5_md> "\004") at crypto/objects/o_names.c:230
...
#5  0x00007f9ca207aa4f in ossl_init_ssl_base () at ssl/ssl_init.c:77
...
#10 0x00007f646227abdb in OPENSSL_init_ssl (opts=2097152, settings=<optimized out>) at ssl/ssl_init.c:201
...
#12 0x000055af94bf51e0 in openssl_sys::init ()
#13 0x000055af94bf1eb5 in std::sync::once::Once::call_once::{{closure}} ()
#14 0x000055af94d4cf16 in call_inner () at src/libstd/sync/once.rs:387
...
#18 0x000055af94b5d2c3 in wasm_pack::manifest::Crate::return_wasm_pack_latest_version ()
...

There's another thread that looks like:

Thread 2 (Thread 0x7f9ca1680980 (LWP 17183)):
#0  0x00007f9ca1c690f8 in _fini () from /usr/lib64/libsmime3.so
#1  0x00007f9ca216a605 in _dl_fini () at dl-fini.c:143
#2  0x00007f9ca1ebd8fc in __run_exit_handlers (status=1, listp=0x7f9ca2049718 <__exit_funcs>, run_list_atexit=run_list_a
texit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#3  0x00007f9ca1ebda4a in __GI_exit (status=<optimized out>) at exit.c:139
#4  0x00005565384311b7 in std::sys::unix::os::exit () at src/libstd/sys/unix/os.rs:541
#5  0x000055653842fc3f in std::process::exit () at src/libstd/process.rs:1485
#6  0x000055653830c1cd in clap::app::App::get_matches ()
#7  0x00005565381e84a1 in wasm_pack::main ()
...

or:

Thread 2 (Thread 0x7f6461880980 (LWP 17749)):
#0  0x00007f6462369995 in _dl_fixup (l=0x7f64620408b0, reloc_arg=<optimized out>) at ../elf/dl-runtime.c:93
#1  0x00007f646237035a in _dl_runtime_resolve_xsave () at ../sysdeps/x86_64/dl-trampoline.h:126
#2  0x00007f6461d358a3 in __do_global_dtors_aux () from /usr/lib64/libnss3.so
#3  0x00007ffccfedee00 in ?? ()
#4  0x00007f646236a5e3 in _dl_fini () at dl-fini.c:138
Backtrace stopped: frame did not save the PC

Looking at openssl (I'm using 1.1.0k), it uses pthread_once (or equivalent) to run its initialization function once, and registers an atexit handler that tears everything down quite early in that initialization. We're crashing trying to use a static lock in o_names.c that got nulled out by OBJ_NAME_cleanup, called from evp_cleanup_int, called by OPENSSL_cleanup, which is registered as an atexit handler by ossl_init_base, which runs early in OPENSSL_init_crypto. which gets called by OPENSSL_init_ssl before the call to ossl_init_ssl_base the crashing thread is in.

The backtraces from @derekdreery also contain openssl initialization functions and also occur when exiting immediately, so I think it's the same crash.

The curl-sys assert was curl failing to initialize. I haven't confirmed it has a similar race (and handles it better by failing initialization instead of crashing) but it seems plausible.

richard-uk1 commented 5 years ago

@marienz great work, I didn't know enough about ssl/curl to work out what was going on.

richard-uk1 commented 5 years ago

I've now recreated this with the pre-built binary.

richard-uk1 commented 5 years ago

So basically, if you initialize openssl in a thread that's not the main thread, you can get a race condition between an atexit handler on the process, and the initialization function. So you must run OPENSSL_init_ssl on the main thread, or synchronize it finishing with any exit call.

@marienz does that sound right?

marienz commented 5 years ago

I don't think this is limited to initialization. I think if we run atexit handlers on the main thread while another thread is still using openssl (either initializing it or actually using it to talk to something over the network) we'll potentially crash.

Moving openssl initialization to the main thread might end up making this harder to hit, because anything involving the network is most likely slower than running atexit handlers and exiting. If the version-checking thread is blocked in a network-related syscall while we exit, we shouldn't crash.

But I'd argue the "proper" fix here is to just join the version-checking thread from the main thread, after making sure wasm-pack does not do its version check more often than necessary (#653 and related issues). This will make wasm-pack a little slower to exit when it decides it has to do the version check and it doesn't have to do any work at the same time, but it should be pretty rare to hit both of those conditions simultaneously (for example: you'd never see this twice in a row). And if wasm-pack doing actual work is faster than the version check, waiting for the version check seems like a good idea (we do want that check to go through every now and then... if wasm-pack is consistently "too fast" the version check will never happen).

dhl commented 4 years ago

Hi everyone!

I ran into a similar issue with 0.9.1. The race condition surfaces when wasm-pack is compiled in release mode, but does not appear if it's compiled in debug mode.

My configuration:

OS:              NixOS 19.09
rustc version:   1.41.0
GCC version:     9.2.0
OpenSSL version: 1.1.1d
dhl commented 4 years ago

I'm happy to report that with LibreSSL 3.0.2, the race conditions I experienced are gone in both release and debug modes.

ashleygwilliams commented 4 years ago

@dhl thanks for letting us know!! do you think we should document this somewhere?

dhl commented 4 years ago

I would love to see this documented somewhere, but maybe it's a bit too early to say LibreSSL fixes the segfaults before others could confirm the same?

Thanks for the great work on wasm-pack, by the way! :heart:

marienz commented 4 years ago

I had a quick look to see if there's an obvious difference between LibreSSL and OpenSSL, or if it's just more timing changes hiding the problem. There is a real difference, but I don't know if it's intentional or not.

Comparing OpenSSL 1.1.1d and LibreSSL 3.0.2, the important difference is that I see no atexit() handlers in LibreSSL.

However, that might be because LibreSSL currently does not need them, rather than by design. The lock whose cleanup triggered the segfault I saw before is also not present in libressl. It looks like this lock was introduced in openssl relatively recently (https://github.com/openssl/openssl/pull/3525), and (assuming https://github.com/libressl-portable/openbsd/commits/master/src/lib/libcrypto/objects/o_names.c is current) libressl has not yet picked up those changes.

So it's possible libressl still has that race condition, and that if they choose to fix it the same way openssl did it'll start crashing wasm-pack the same way. (Or if they start using atexit() for cleanup for other reasons.) Unless someone confirms libressl intentionally does not use atexit handlers I wouldn't recommend LibreSSL as the fix for this one.

elidupree commented 3 years ago

I don't know if this is the same issue, but I also just got a segfault when running wasm-pack with no arguments:

$ wasm-pack 
wasm-pack 0.9.1
Ashley Williams <ashley666ashley@gmail.com>
πŸ“¦

USAGE:
    wasm-pack [FLAGS] [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -q, --quiet      No output printed to stdout
    -V, --version    Prints version information
    -v, --verbose    Log verbosity is based off the number of v used

OPTIONS:
        --log-level <log_level>    The maximum level of messages that should be logged by wasm-pack. [possible values:
                                   info, warn, error] [default: info]

SUBCOMMANDS:
    build      πŸ—οΈ
    help       Prints this message or the help of the given subcommand(s)
    login      πŸ‘€
    new        πŸ‘
    pack       🍱
    publish    πŸŽ†
    test       πŸ‘©πŸ”¬
Segmentation fault

This was on a fresh install; I've never used wasm-pack before, and I literally just did curl https://rustwasm.github.io/wasm-pack/installer/init.sh -sSf | sh and then wasm-pack and this happened. It's not repeatable (running wasm-pack again did not segfault).

cryptoquick commented 1 year ago

Hi, I tried cloning this repo and compiling from master, then running cargo install --path ., and it resulted in a 17MB binary that exited with terminated by signal SIGSEGV (Address boundary error) upon running build --release --target bundler. I read this thread, then tried with cargo install --debug --path ., which resulted in a 140MB binary (!!) and tried again, but no luck. It will run if no arguments will given, printing the CLI help, but it'll segfault when running it with the build command. I also tried using absolute path to the target binaries, but that also didn't work.

I'm running Garuda, Arch-based distro: Linux ... 6.1.12-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Tue, 14 Feb 2023 22:08:11 +0000 x86_64 GNU/Linux rustc: 1.67.1 wasm-pack: b4e619c8a13a8441b804895348afbfd4fb1a68a3

This was working before when I was using the version that comes with Arch: https://archlinux.org/packages/community/x86_64/wasm-pack/

It goes back to working when I do a cargo uninstall wasm-pack.

marienz commented 1 year ago

If you're segfaulting only on specific wasm-pack invocations, you're probably hitting a different bug. This bug covers a race condition that can be hit by running wasm-pack with no arguments (or other wasm-pack invocations that are fast). If you're only segfaulting when building something, it's probably a different bug.

Please share a backtrace for the segfault (and if it doesn't look like this bug, probably best to file a new one and CC me if you want me to look at the backtraces).

Liamolucko commented 1 year ago

I was able to boil the segfault down to this minimal repro:

# Cargo.toml
[package]
name = "wasm-pack-segfault-repro"
version = "0.1.0"
edition = "2021"

[dependencies]
curl = "0.4.44"
openssl-sys = { version = "0.9.80", features = ["vendored"] }
reqwest = { version = "0.11.14", features = ["blocking"] }
// main.rs
use curl::easy;

fn main() {
    // This code comes from `wasm_pack::manifest::Crate::check_wasm_pack_latest_version`.
    struct NoopHandler;

    impl easy::Handler for NoopHandler {
        fn write(&mut self, data: &[u8]) -> Result<usize, easy::WriteError> {
            Ok(data.len())
        }
    }

    let mut easy = easy::Easy2::new(NoopHandler);
    easy.url("https://example.com").unwrap();
    easy.get(true).unwrap();
    // The segfault happens here.
    easy.perform().unwrap();

    // This code comes from `wasm_pack::install::krate::Krate::new`.
    // It's never reached, but the segfault doesn't happen if it's not here.
    let _ = reqwest::Client::builder();
}

This is the backtrace:

#0  0x00007ffff7c8bd66 in pthread_rwlock_wrlock () from /usr/lib/libc.so.6
#1  0x00007ffff77ca0ee in CRYPTO_THREAD_write_lock () from /usr/lib/libcrypto.so.3
#2  0x00007ffff7837ee5 in ?? () from /usr/lib/libcrypto.so.3
#3  0x00007ffff7837f81 in X509_STORE_add_cert () from /usr/lib/libcrypto.so.3
#4  0x00007ffff781c540 in X509_load_cert_crl_file_ex () from /usr/lib/libcrypto.so.3
#5  0x00007ffff781c696 in ?? () from /usr/lib/libcrypto.so.3
#6  0x00007ffff7837a4a in X509_STORE_load_file_ex () from /usr/lib/libcrypto.so.3
#7  0x00007ffff7f62012 in ?? () from /usr/lib/libcurl.so.4
#8  0x00007ffff7f63357 in ?? () from /usr/lib/libcurl.so.4
#9  0x00007ffff7f5ac0c in ?? () from /usr/lib/libcurl.so.4
#10 0x00007ffff7f3641b in ?? () from /usr/lib/libcurl.so.4
#11 0x00007ffff7f3944e in curl_multi_perform () from /usr/lib/libcurl.so.4
#12 0x00007ffff7f10044 in curl_easy_perform () from /usr/lib/libcurl.so.4
#13 0x0000555555654038 in curl::easy::handler::Easy2<wasm_pack_segfault_repro::main::NoopHandler>::perform<wasm_pack_segfault_repro::main::NoopHandler> (self=0x7fffffffdd20)
    at /home/liam/.cargo/registry/src/github.com-1ecc6299db9ec823/curl-0.4.44/src/easy/handler.rs:3163
#14 0x0000555555652a65 in wasm_pack_segfault_repro::main () at src/main.rs:16

I'm still not really sure why it's happening, though, and particularly why including the bit of code referencing reqwest that's never reached is necessary. The segfault happens to also come from libcrypto, but it doesn't seem particularly related to the original issue.

Liamolucko commented 1 year ago

The segfault only happens when vendored OpenSSL is enabled, though, so running cargo install --path . --no-default-features will get you a working version of wasm-pack.

Liamolucko commented 1 year ago

Apparently this is only a problem on the x86_64-unknown-linux-gnu target, not x86_64-unknown-linux-musl, which is why the released builds work fine. So, there's another workaround.

marienz commented 1 year ago

Your example does not crash for me on Fedora 37.

You're using vendored OpenSSL, but it looks like you're using your system's libcurl, which seems to be linked against your system's OpenSSL's libcrypto. Mixing vendored and system OpenSSL might be a contributing factor to your crash. (Looks like the curl crate uses system libcurl if available and silently falls back to a vendored copy of curl if not... I installed libcurl-devel and confirmed I then pull in system libcurl and libcrypto, but still didn't crash).

cryptoquick commented 1 year ago

I was able to boil the segfault down to this minimal repro: ...

I can confirm using @Liamolucko's MRE this also segfaults on my system, with both cargo run and cargo run --release. I'm not sure how to get a backtrace.