usnistgov / REFPROP-issues

A repository solely used for reporting issues with NIST REFPROP
26 stars 13 forks source link

REFPROP hash generation #258

Closed ianhbell closed 4 years ago

ianhbell commented 4 years ago

REFPROP uses a unique identifier for pure fluids that is based upon the InChI key, which is itself a hash of the InChI string. Much more information on these things on the wikipedia page: https://en.wikipedia.org/wiki/International_Chemical_Identifier . These strings (even the INCHI key), were deemed to be too long, so a shorter unique identifier was used, the first seven digits of which can be obtained from the SHA256 hash of the InChI key. A brief code snippet in Python explains how it works:

# For propane
import hashlib
InChIkey = 'ATUOYWHBWRKTHZ-UHFFFAOYSA-N'
uid = hashlib.sha256(InChIkey.encode('UTF-8')).hexdigest()[2:9]
print(uid)

will print

70c6aac

The final character is normally 0, so the code for propane becomes 70c6aac0. In some cases (e.g. for ortho-, para- and normal-hydrogen), the InChI key does not capture the requisite information and a disambiguation character is used to ensure the hash is unique among fluids.

ianhbell commented 4 years ago

This was wrapped up into a little jupyter notebook that can run natively in the browser: https://github.com/ianhbell/REFPROP-hash