Open j-wags opened 1 year ago
Is there something special about C7H12N2O4 or could a different formula work? If records are goofy they're probably not uniformly goofy across all empirical formulae. Unless they are, which would be a problem.
C7H12N2O4 is a capped alanine 1-mer, which has all "good" entries as far as I can tell (CMILES present, with fully defined stereo). The old molecule, C16H20N3O5, has some fancy ring stuff that could plausibly confuse a cheminformatics toolkit. The weird thing with the capped alanine 1-mer is that, when I load it on my computer, thousands of records come down, ALL of which are "good". But when CI does it, the first record it gets in the iterator is frequently missing CMILES.
For what it's worth, if I drop the limit
argument I get less than 100% of records having CMILES using your code.
Not having investigated either, I noticed a couple of things that could be worth exploring more:
pytest
but I'm not sure if throwing the --doctest-modules
flags causes it to completely defer to the standard library doctest
module or if it has its own magic way of running them. I'd guess the former."attributes"
or "extras"
in the record, but doesn't provide any more information about the record when throwing the exception. It might be useful to chuck some information about what's actually in the object, capturing if there's something more seriously borked like empty records, i.e.
diff --git a/openff/toolkit/topology/molecule.py b/openff/toolkit/topology/molecule.py
index c5438644..906408b0 100644
--- a/openff/toolkit/topology/molecule.py
+++ b/openff/toolkit/topology/molecule.py
@@ -4713,7 +4713,8 @@ class FrozenMolecule(Serializable):
else:
raise KeyError(
"The record must contain the hydrogen mapped smiles to be safely made from the archive. "
qca_record
"qca_record
. "f"Record includes keys {*qca_record.keys()]" )
# make a new molecule that has been reordered to match the cmiles mapping
We should contact MolSSI if we're confident in the hypothesis that records are not returned in a reliable order, though I'm skeptical of that theory
Working on something else but just stumbled on a possible explanation+fix - https://molssi.github.io/QCFractal/user_guide/molecule.html
I am running into this as well. On the legacy server, I have code to query batches, but now I am getting some results missing, and this is new behavior. If I immediately query the missing ID by itself, I get a result.
I've tried - again - to figure out what's going on with the flaky
from_qcschema
examples, and have - again - failed.The proximal error is:
I'm unable to reproduce this locally. To try and reproduce the issue I played around with the
limit
keyword (which seems to default to its maximum value of 2000) andskip
keyword (which continues returning 2000 molecules even when raised to values >10,000, but doesn't return anything at values >1,000,000, so it's clearly doing something).For future work my code is
My current hypotheses are either:
Originally posted by @j-wags in https://github.com/openforcefield/openff-toolkit/issues/1646#issuecomment-1597540167