Missing molecules when submited.

Sometimes when we new submit datasets we see fewer tasks than expected actually created in QCArchive. In this for example, there are 1043 unique optimizations expected by QCSubmit but when running this locally we see only 1041 tasks are actually made. The cause of this is the fast deduplication check which is done by QCFractal when we add a molecule to a dataset. It checks if the index.lower() is already in the dataset however as we use the smiles to index the molecules our index labels are case sensitive which causes different molecules to be considered the same for example: c1ccc(cc1)Oc2ccccc2 and c1ccc(cc1)OC2CCCCC2 Cc1ccccc1Oc2ccccc2

and Cc1ccccc1OC2CCCCC2

So to solve this we would have to make sure that the index for the molecule was not case sensitive (like inchikey) or actually unique such as adding explicit hydrogens to the smiles.

openforcefield / openff-qcsubmit

Missing molecules when submited. #61