Open jthorton opened 3 years ago
It is planned that QCArchive will stop lowering the index we store the molecules as which will allow us to use case sensitive indexing. I will leave this open until the problem is fixed in QCArchive.
Until then users can get around this issue by changing the index in the dataset object to add any tag they whish to make the index unique.
Sometimes when we new submit datasets we see fewer tasks than expected actually created in QCArchive. In this for example, there are 1043 unique optimizations expected by QCSubmit but when running this locally we see only 1041 tasks are actually made. The cause of this is the fast deduplication check which is done by QCFractal when we add a molecule to a dataset. It checks if the
and
c1ccc(cc1)OC2CCCCC2
Cc1ccccc1Oc2ccccc2
index.lower()
is already in the dataset however as we use the smiles to index the molecules our index labels are case sensitive which causes different molecules to be considered the same for example: c1ccc(cc1)Oc2ccccc2So to solve this we would have to make sure that the index for the molecule was not case sensitive (like inchikey) or actually unique such as adding explicit hydrogens to the smiles.