shepmaster / twox-hash

A Rust implementation of the XXHash algorithm.
MIT License
360 stars 41 forks source link

Nondeterministic results when building as cdylib #79

Open rmcgibbo opened 2 years ago

rmcgibbo commented 2 years ago

I seem to have xxh3 hashes that depend on some uninitialized variable (?) only when compiling into a cdylib. I initially noticed this in a python extension with pyo3, but I've managed to make the following simple reproduction:

# Cargo.toml
[package]
name = "foo"
version = "0.1.0"
authors = ["Robert T. McGibbon <rmcgibbo@gmail.com>"]
edition = "2018"

[lib]
name = "foo"
crate-type = ["cdylib"]

[dependencies]
twox-hash = {version = "1.6.1", default-features = false }
# src/lib.rs
use std::hash::Hasher;
use twox_hash::xxh3::HasherExt;

#[no_mangle]
pub extern "C"
fn hash_a_constant() {
    let seed = 0;
    let ss = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
    let mut hasher = twox_hash::Xxh3Hash128::with_seed(seed);
    hasher.write(ss.as_bytes());
    println!("hash a constant of size {} {}", ss.len(), hasher.finish_ext());
}

Running with

cargo build
$ python -c 'import ctypes; ctypes.CDLL("target/debug/libfoo.so").hash_a_constant()'
hash a constant of size 257 233319518405983521442151793707413278448
$ python -c 'import ctypes; ctypes.CDLL("target/debug/libfoo.so").hash_a_constant()'
hash a constant of size 257 263704682723094846392981853651016498070
$ python -c 'import ctypes; ctypes.CDLL("target/debug/libfoo.so").hash_a_constant()'
hash a constant of size 257 278186113363433872561296712692319909969

I seem to get different results every time.

If instead I move the contents of the function hash_a_constant to fn main and run it as a rust binary, rather than calling the symbol in the .so, then I always get a stable value (295345945357457693424139354068657467622).

For what it's worth, this is on rustc 1.52.1 on x86_64-linux

rmcgibbo commented 2 years ago

For what it's worth, the results are stable from run to run when:

  1. the input string is 256 bytes or shorter, and unstable when its 257 or longer.
  2. not unstable if i use twox_hash::xxh3::hash128, it's only when going through twox_hash::Xxh3Hash128.
rmcgibbo commented 2 years ago

For what it's worth, I can also see this, I think, in valgrind as some reads from uninitialized memory simply when running this (no python, cylib, or anything like that required).

use std::hash::Hasher;
use twox_hash::xxh3::HasherExt;

fn main() {
    let seed = 0;
    let ss = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
    let mut hasher = twox_hash::Xxh3Hash128::with_seed(seed);
    hasher.write(ss.as_bytes());
    println!("hash a constant of size {} {}", ss.len(), hasher.finish_ext());
}
shepmaster commented 2 years ago

That... doesn't sound good.

@flier, it sounds like some unsafety has crept in with the xx3 implementation; any thoughts?