rust-pcap / pcap

Rust language pcap library
Apache License 2.0
620 stars 144 forks source link

test_raw_fd_api fails on ubuntu sometimes #119

Closed Trolldemorted closed 4 years ago

Trolldemorted commented 4 years ago
test test_raw_fd_api ... FAILED

failures:

---- test_raw_fd_api stdout ----
thread 'test_raw_fd_api' panicked at 'called `Result::unwrap()` on an `Err` value: PcapError("truncated dump file; tried to read 4 file header bytes, only got 0")', tests/lib.rs:238:34
stack backtrace:
   0: rust_begin_unwind
             at /rustc/397b390cc76ba1d98f80b2a24a371f708dcc9169/library/std/src/panicking.rs:475
   1: core::panicking::panic_fmt
             at /rustc/397b390cc76ba1d98f80b2a24a371f708dcc9169/library/core/src/panicking.rs:85
   2: core::option::expect_none_failed
             at /rustc/397b390cc76ba1d98f80b2a24a371f708dcc9169/library/core/src/option.rs:1221
   3: core::ops::function::FnOnce::call_once
   4: core::ops::function::FnOnce::call_once
             at /rustc/397b390cc76ba1d98f80b2a24a371f708dcc9169/library/core/src/ops/function.rs:227
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

failures:
    test_raw_fd_api

test result: FAILED. 6 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--test lib'
##[error]Process completed with exit code 101.

https://github.com/Trolldemorted/pcap/runs/1039588778?check_suite_focus=true

Wojtek242 commented 4 years ago

Are nightly builds something we should/want to support? Stable+beta yes. My impression was that the beta branch was meant for catching bugs before they hit stable. This crate was left on its own for 3 years and somehow it didn't break due to a compiler update. So is this nightly breakage really just the first one that happened since then?

sekineh commented 4 years ago

If we move to edition 2018, specifying minimum supported rustc version (MSRV) is rather useful for users.

In that case, msrv (1.31 or something) + stable looks essential.

Trolldemorted commented 4 years ago

Are nightly builds something we should/want to support? Stable+beta yes. My impression was that the beta branch was meant for catching bugs before they hit stable.

Rust does not allow you to build individual dependencies with a different toolchain, so as soon as one dependency is nightly-only you must build your entire project with nightly. Unfortunately, nightly-only crates are still popular and around, e.g. rocket has been around for years and it still does not build on stable.

This crate was left on its own for 3 years and somehow it didn't break due to a compiler update. So is this nightly breakage really just the first one that happened since then?

We don't really know when it stopped working since we didn't have daily nightly CI that would track that :( Did you ever run the tests with nightly?

Trolldemorted commented 4 years ago

I am sorry to bring more bad news, but now it failed on stable: https://github.com/Trolldemorted/pcap/runs/1041363263

I suppose we have a race condition in either the test or the lib 🤔Unfortunately I have trouble to trigger that bug locally. Any ideas?

sekineh commented 4 years ago

Just my theory. ~libpcap is not designed to be called from multiple threads. (Some feature might utilize shared state)~ Edit: no source ~But~ cargo test run parallel tests using threads.

We could try this?: $ cargo test -- --test-threads=1 (Edit: It might just mask bug, though) Ref: https://doc.rust-lang.org/book/ch11-02-running-tests.html

Trolldemorted commented 4 years ago

If this crate exposes the functionality of libpcap in an incorrect/unsafe way, we should address that immediately. Is there any documentation about what you must not call from different threads?

I just had a look at tests/lib.rs - is my assumption that virtually all tests write to /tmp/pcap/test.pcap and delete it afterwards correct? I guess the tests are deleting the file while another test is running. Is there a specific reason that all tests use the same temporary file?

sekineh commented 4 years ago

If this crate exposes the functionality of libpcap in an incorrect/unsafe way, we should address that immediately. Is there any documentation about what you must not call from different threads?

No evidense.

I just had a look at tests/lib.rs - is my assumption that virtually all tests write to /tmp/pcap/test.pcap and delete it afterwards correct? I guess the tests are deleting the file while another test is running. Is there a specific reason that all tests use the same temporary file?

Oh it looks likely. "test.pcap" is shared between each tests. shared resource on our side.

Trolldemorted commented 4 years ago

The documentation of temdir says:

The directory and everything inside it will be automatically deleted once the returned TempDir is destroyed.

so as soon as dir goes out of scope the directory is deleted, which can happen while another test is still using it. So we have the problem that the file is concurrently modified by tests that read/write to it and the problem that the entire folder is deleted everytime a test finishes.

Trolldemorted commented 4 years ago

Damn, I was wrong: TempDir::new("pcap") takes "pcap" as the prefix, not the entire filename. The resulting folders are unique and have names like tmp/pcap.Ks0qC6Q8M6wl.

sekineh commented 4 years ago

TempDir::new() uses thread_rng() to produce random values. https://docs.rs/tempdir/0.3.5/src/tempdir/lib.rs.html#183

I'm not sure if it produces different numbers on each thread....

sekineh commented 4 years ago

But they say:

ThreadRng is a cryptographically secure PRNG

Trolldemorted commented 4 years ago
        let mut rng = thread_rng();
        for _ in 0..NUM_RETRIES {
            let suffix: String = rng.gen_ascii_chars().take(NUM_RAND_CHARS).collect();
            let leaf = if prefix.len() > 0 {
                format!("{}.{}", prefix, suffix)
            } else {
                // If we're given an empty string for a prefix, then creating a
                // directory starting with "." would lead to it being
                // semi-invisible on some systems.
                suffix
            };
            let path = tmpdir.join(&leaf);
            match fs::create_dir(&path) {
                Ok(_) => return Ok(TempDir { path: Some(path) }),
                Err(ref e) if e.kind() == ErrorKind::AlreadyExists => {}
                Err(e) => return Err(e),
            }
        }

        Err(Error::new(ErrorKind::AlreadyExists,
                       "too many temporary directories already exist"))

If the dir exists they retry, and if every retry fails you get an AlreadyExists

Trolldemorted commented 4 years ago

I can now reliably reproduce this issue with this test:


#[test]
#[cfg(not(windows))]
fn stress() {
    let mut tests = vec!();
    for _ in 0..1000 {
        tests.push(std::thread::spawn(move || {
            test_raw_fd_api()
        }));
    }
    for test in tests {
        test.join().unwrap()
    }
}

I think there are multiple issue here. E.g. does this io::copy really copy everything to the pipe, or just what is currently available? How does a pipe on linux behave if no data is in the buffer but it wasn't closed yet?

    thread::spawn(move || {
        // Write all packets to the pipe
        let cap = Capture::dead(Linktype(1)).unwrap();
        let mut save = cap.savefile_raw_fd(fd_out).unwrap();
        packets_c.foreach(|p| save.write(p));
        // fd_out will be closed by savefile destructor
    });

    // Save the pcap from pipe in a separate thread.
    // Hypothetically, we could do any sort of processing here,
    // like encoding to a gzip stream.
    let mut file_in = unsafe { File::from_raw_fd(fd_in) };
    let mut file_out = File::create(&filename).unwrap();
    io::copy(&mut file_in, &mut file_out).unwrap();
sekineh commented 4 years ago

Potential problem.

In this for loop, fd_in is not libc::close()ed. https://github.com/ebfull/pcap/blob/f9408c59587bafa5b52a2f1785f5f994f453336c/tests/lib.rs#L241

sekineh commented 4 years ago

Will Capture::from_raw_fd close fd_in? Edit: I feel like impl Drop for Capture should do it. Edit2: Drop impl calls pcap_close()which should close files associated to pcap_t object. (man pcap_close)