mrhooray / crc-rs

Rust implementation of CRC(16, 32, 64) with support of various standards
Apache License 2.0
187 stars 49 forks source link

Feature: Implement `core::hash::Hasher` on `Digest` #106

Closed rjzak closed 11 months ago

rjzak commented 12 months ago

I'm trying to update the crc dependency for my crate, and I need the core::hash::Hasher trait. Related to https://github.com/mrhooray/crc-rs/issues/62, I'm having some issues I can't figure out.

Code: https://github.com/malwaredb/lzjd-rs/blob/ee6183a692b8691d60550ef51f22132243f0b263/src/crc32.rs#L21

The closest I've come is this weird code which doesn't work:

pub struct CRC32Hasher<'a> {
    crc: crc::Crc<u32>,
    digest: Option<&'a crc::Digest<'a, u32>>,
}

impl<'a> CRC32Hasher<'a> {
    fn new() -> Self {
        let crc = crc::Crc::<u32>::new(&CRC_32_BZIP2);
        Self { crc, digest: None }
    }
}

impl<'a> Hasher for CRC32Hasher<'a> {
    fn finish(&self) -> u64 {
        self.digest.clone().unwrap().finalize() as u64
    }

    fn write(&mut self, bytes: &[u8]) {
        if self.digest.is_none() {
            self.digest = Some(self.crc.digest()); // doesn't live long enough
        }

        //self.digest.as_ref().unwrap().update(bytes);
    }
}

/// std::hash::BuildHasher that builds CRC32Hashers
#[derive(Clone, Default)]
pub struct CRC32BuildHasher<'a> {
    phantom: Option<&'a [u8]>, // need lifetimes specified, and need a member to connect a lifetime, but this data isn't used
}

impl<'a> BuildHasher for CRC32BuildHasher<'a> {
    type Hasher = CRC32Hasher<'a>;

    fn build_hasher(&self) -> Self::Hasher {
        CRC32Hasher::new()
    }
}
  1. The finalize in the crc crate takes ownership, which causes a problem with the Hasher train.
  2. I can't have crc::Crc::<u32>::new(&CRC_32_BZIP2)::digest(); as the sole member of the struct of a complaint that the intermediate value doesn't last long enough. Nor does have the variable in the function scope work, since the borrow checker also complains about the reference to the local variable.

But having the Digest struct in this crate already implement the Hasher trait would be helpful, and probably useful to others as well. But maybe I'm missing something?

KillingSpark commented 11 months ago

That is quite the interesting problem. The core of your immediate issues with the lifetimes is that you are basically trying to construct a self-refential struct with your CRC32Hasher. The digest field holds a reference to the crc field.

I'd propose to create a static variable that stores the Crc. This would allow to discard all lifetimes

pub struct CRC32Hasher {
    digest: crc::Digest<'static, u32>,
}
static CRC: crc::Crc<u32> = crc::Crc::<u32>::new(&CRC_32_BZIP2);

impl CRC32Hasher {
    fn new() -> Self {

        Self { digest: CRC.digest() }
    }
}

impl Hasher for CRC32Hasher {
    fn finish(&self) -> u64 {
        self.digest.clone().finalize() as u64
    }

    fn write(&mut self, bytes: &[u8]) {
        self.digest.update(bytes);
    }
}

/// std::hash::BuildHasher that builds CRC32Hashers
#[derive(Clone, Default)]
pub struct CRC32BuildHasher {}

impl BuildHasher for CRC32BuildHasher {
    type Hasher = CRC32Hasher;

    fn build_hasher(&self) -> Self::Hasher {
        CRC32Hasher::new()
    }
}
KillingSpark commented 11 months ago

Edit Note: This originally said I couldn't figure out how to do this, but then I saw my mistake...

You can also just lift the Crc struct into the CRC32BuildHasher so you can avoid the static/const variable:

pub struct CRC32Hasher<'a> {
    digest: crc::Digest<'a, u32>,
}

impl<'a> CRC32Hasher<'a> {
    fn new(crc: &'a Crc<u32>) -> Self {
        Self { digest: crc.digest() }
    }
}

impl<'a> Hasher for CRC32Hasher<'a> {
    fn finish(&self) -> u64 {
        self.digest.clone().finalize() as u64
    }

    fn write(&mut self, bytes: &[u8]) {
        self.digest.update(bytes);
    }
}

pub struct CRC32BuildHasher {
    crc: crc::Crc<u32>,
}

impl CRC32BuildHasher {
    pub fn new() -> Self {
        Self { crc: crc::Crc::<u32>::new(&CRC_32_BZIP2) }
    }
}

impl<'a> BuildHasher for &'a CRC32BuildHasher {
    type Hasher = CRC32Hasher<'a>;

    fn build_hasher(&self) -> Self::Hasher { 
        CRC32Hasher::new(&self.crc)
    }
}
rjzak commented 11 months ago

Thank you for that. Due to these difficulties, and having an issue trying to figure out which flavour of CRC32 was the correct one I needed for my algorithm (I unsuccessfully tried options which seemed usable), I switched to the crc32fast crate.