openforcefield / cmiles

Generate canonical molecule identifiers for quantum chemistry database
https://cmiles.readthedocs.io
MIT License
23 stars 7 forks source link

Discrepancies in re-generated reference data #43

Closed mattwthompson closed 3 years ago

mattwthompson commented 3 years ago

I'm trying to hunt down how RDKit differences in recent versions affect this data and it seems that I cannot even re-generate the reference data to match the existing files

$ python generate_reference.py 2>/dev/null
2019.03.2
$ git diff --name-only | cat
cmiles/tests/reference/drug_bank_inchi_rd_2019.03.2.txt
cmiles/tests/reference/drug_bank_inchikey_rd_2019.03.2.txt
cmiles/tests/reference/drug_bank_mapped_smi_rd_2019.03.2.smi

The differences are fairy small, but it's concerning that there are any at all

mattwthompson commented 3 years ago

Cleaned up in f5a2985, although it wasn't a major issue to begin with. It was my own confusion resulting from a bunch of moving parts changing at once.