openforcefield / cmiles

Generate canonical molecule identifiers for quantum chemistry database
https://cmiles.readthedocs.io
MIT License
23 stars 7 forks source link

IUPAC name and chemical formula #6

Closed ChayaSt closed 5 years ago

ChayaSt commented 5 years ago

Add IUPAC names and chemical formula. The chemical formula should follow the same standardization as the QCArchive project.

@dgasmith, can you let me know how you generate the chemical formula?

ChayaSt commented 5 years ago

@dgasmith, the first section (or layer) of the InChI is the empirical chemical formula and is given as: Begin with carbon atom, then hydrogen, then all other elements in alphabetical order. Will this work with the representation in QCFractal?

In InChI, this is always given for the neutral species so that the core parent structure of different protonation states is represented.

dgasmith commented 5 years ago

The exact molecular formula code is:

def molecular_formula(symbols):
        count = collections.Counter(x.title() for x in symbols)

        ret = []
        for k in sorted(count.keys()):
            c = count[k]
            ret.append(k)
            if c > 1:
                ret.append(str(c))

        return "".join(ret)

>>> molecular_formula(["he", "HE"])
"He2"
ChayaSt commented 5 years ago

Addressed with #9.

Note, the molecular formula is in Hill notation. Carbon and hydrogen are listed first and then all other elements are in alphabetical order of their symbols.