isayev / ANI1_dataset

A data set of 20 million calculated off-equilibrium conformations for organic molecules
MIT License
96 stars 18 forks source link

Publish sha256 hashes for improved user safety #9

Open rsokl opened 2 years ago

rsokl commented 2 years ago

Hello! Could you compute and publish the sha256 hashes for your ani-1_dataset.tar.gz file and include them in your README? This will help users to ensure that the data that they download has not been manipulated by some third party.

You can easily compute a hash using:

from hashlib import sha256

def hash_check(fname, hash_fn=sha256):
    """Reads in data from disk and returns hash

    Parameters
    ----------
    fname : str | Path

    hash_fn : Callable[[], Hash], optional (default=hashlib.sha256)

    Examples
    --------
    Checking sha256 hash..

    >>> from hashlib import sha256
    >>> hash_check('./text.txt, sha256)
    'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'
    """
    hash_fn = hash_fn()
    with open(fname, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_fn.update(chunk)
    return hash_fn.hexdigest()

Thanks!