oconnor663 / blake3-py

Python bindings for the BLAKE3 cryptographic hash function
Other
139 stars 12 forks source link

[feature request] copy() method and `name` attribute #16

Closed xkortex closed 3 years ago

xkortex commented 3 years ago

I'd like to implement a copy() method and a name attribute (which just returns 'blake3'). This would bring its interface in line with the other hashlib functions. I think this might do it, but I am insufficiently rusty 🦀 , so I was hoping you could take a look before I PR it.

This failed and I can't figure out why.

        #[getter]
        fn name(
            &mut self,
            py: Python,
        ) -> PyResult<PyString> {
            Ok(PyString::from_str("blake3"))
        }
     Compiling blake3 v0.1.7 (/private/var/folders/wp/kcxqg3tx3sg5cyl2sq9hw6qh0000gn/T/pip-req-build-t1pjouit)
  error: could not compile `blake3`.

  To learn more, run the command again with --verbose.
  💥 maturin failed
    Caused by: Failed to build a native library through cargo
    Caused by: Cargo build finished with "exit code: 101": `cargo rustc --message-format json --manifest-path Cargo.toml --lib --release -- -C link-arg=-undefined -C link-arg=dynamic_lookup`
  Running `maturin pep517 build-wheel -i python`
  Error: Command '['maturin', 'pep517', 'build-wheel', '-i', 'python']' returned non-zero exit status 1.
  ----------------------------------------
  ERROR: Failed building wheel for blake3
Failed to build blake3
ERROR: Could not build wheels for blake3 which use PEP 517 and cannot be installed directly

This seems to work:

        /// Return a copy of a Blake3Hasher object.
        /// I have no idea if this is threadsafe
        fn copy(
            &mut self,
            py: Python,
        ) -> PyResult<Blake3Hasher> {
            Ok( Blake3Hasher { rust_hasher: self.rust_hasher.clone() })
        }
In [1]: import blake3

In [2]: b1 = blake3.blake3(b'foo')

In [3]: b1.hexdigest()
Out[3]: '04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a75777702e9'

In [4]: b2 = b1.copy()

In [5]: b1.update(b'bar')

In [6]: b1.hexdigest()
Out[6]: 'aa51dcd43d5c6c5203ee16906fd6b35db298b9b2e1de3fce81811d4806b76b7d'

In [7]: b2.hexdigest()
Out[7]: '04e0bb39f30b1a3feb89f536c93be15055482df748674b00d26e5a75777702e9'

In [8]: blake3.blake3(b'foobar').hexdigest()
Out[8]: 'aa51dcd43d5c6c5203ee16906fd6b35db298b9b2e1de3fce81811d4806b76b7d'

Thanks,

oconnor663 commented 3 years ago

Good idea!

I'm not intimately familiar with PyO3 myself. Basically I learned it as I went with along with these bindings, and only the parts I needed, plus it was a major version ago. So I can't say exactly why you're getting the errors you're getting. But playing around with it a bit, this simplified version seems to work:

#[getter]
fn name(&self) -> &str {
    "blake3"
}

fn copy(&self) -> Blake3Hasher {
    Blake3Hasher {
        rust_hasher: self.rust_hasher.clone(),
    }
}

On the Rust side of things, the &self as opposed to &mut self indicates that only read access to the hasher is required. In this particular case, since Python doesn't really have a concept of shared vs mutable/unique borrows, I don't think it makes a difference to how the generated binding code actually behaves. But we'd definitely want to avoid unnecessary &mut borrows in a regular Rust codebase, so it's probably a good idea here too.

To your question about thread safety, Rust's story here is pretty clean: If you have a &T or a &mut T, you are guaranteed that no other thread could possibly mutate that T while the reference you're holding is alive. Or at least that it would require unsafe code and trigger undefined behavior to do so, and that the results would definitely not be your fault. (This papers over the possibility that T might contain something like a std::sync::Mutex, which could allow for safe "interior mutability", but for plain-old-data types like ours this is the whole story.) My understanding is that PyO3's job is kind of difficult in this respect, since it's hard to guarantee that no other Python threads are holding a reference to something if you release the GIL. But for our purposes that's PyO3's problem, as we're allowed to write code assuming Rust's usual guarantees are respected.

xkortex commented 3 years ago

17