Open mrshirts opened 2 years ago
This list seems to be almost complete to me. Comments on above points:
Build supercells with specified sizes.
It would be useful to also go back from supercell (in whatever data structure it will be stored) to CIF file. This way, one could use the supercells together with other software. This is a suggestion coming from Caitlin (see Slack) and I agree this would be useful. Does openff-toolkit have capability to write CIF files? If not, we could for now just use the gemmi package.
Perform lattice minimization with OpenMM Does it have to be supercells, or can we do single unit cell -- is that ruled out by OpenMM's minimum image requirements?)
The way I see it, the answer is yes, we must use supercells. OpenMM PBC will require that the nonbonded cutoff cannot be greater than half the box. So we would have to use rather short nonbonded cutoffs which might not be ideal...
Tools to take an MD simulation and estimate structure factor to compare to experiment
Only newer structures will have structure factors deposited. Seems not to be required though. See https://www.ccdc.cam.ac.uk/support-and-resources/support/case/?caseid=bb7c17c1-eef9-4287-bd0b-bc49c55ff06a
Some further minor things:
What to do with missing atoms (e.g. hydrogens)? It might not be obvious how to build them in some cases. We must include sanity checks that make sure the contents of the unit cell is the same as in the experiment.
What to do with partial occupancy and "alternative conformations"?
For post-processing it might be useful to store info on the supercell itself. For instance, number of unit cell repetitions along each axis, fractional coordinates of each unit cell origin in the basis of the supercell, info on which molecules are chemically identical (having a canonical SMILES for each molecule would be good).
Parse the relevant information from CIF files (temperature of the experiment, cell parameters, density) and check for validity. Gemmi can do this.
I'll further think about this and might extend the list above.
What to do with missing atoms (e.g. hydrogens)? It might not be obvious how to build them in some cases. We must include sanity checks that make sure the contents of the unit cell is the same as in the experiment.
What to do with partial occupancy and "alternative conformations"?
I think in both these cases, we want to get a CCDC subset that includes all atoms, and does not include alternative conformations/partial occupancy.
Parse the relevant information from CIF files (temperature of the experiment, cell parameters, density) and check for validity. Gemmi can do this.
This is a suggestion coming from Caitlin (see Slack) and I agree this would be useful. Does openff-toolkit have capability to write CIF files? If not, we could for now just use the gemmi package.
I think for now on these two, we would just rely on gemmi for now. Are you using gemmi as well for some of the functions you have?
I think in both these cases, we want to get a CCDC subset that includes all atoms, and does not include alternative conformations/partial occupancy.
Agreed.
I think for now on these two, we would just rely on gemmi for now. Are you using gemmi as well for some of the functions you have?
Yes, gemmi is a heavy dependency right now. Its main use is to build the initial P1 unit cell.
Yes, gemmi is a heavy dependency right now. Its main use is to build the initial P1 unit cell.
Since it will be an "additional" package to openff, I think this is fine. Looks like conda packages are available, which is good: https://anaconda.org/conda-forge/gemmi/files
The concept is that anything that works better as OpenFF crystal tools would be moved to an OpenFF repository that can be automatically tested and conda released, with explicit calls to OpenFF toolkit.
Tools/functions that would work better in OpenFF ecosystem
Tools/functions to remain outside of OpenFF ecosystem.