openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
443 stars 112 forks source link

mmCIF output - `_atom_site.label_entity_id` not written #285

Closed Augustin-Zidek closed 5 months ago

Augustin-Zidek commented 5 months ago

Currently, OpenMM writes ? in the _atom_site.label_entity_id mmCIF field:

This happens in pdbxfile.py#L453:

line = "%s  %5d %-3s %-4s . %-4s %s ? %5s %s %10.4f %10.4f %10.4f  0.0  0.0  ?  ?  ?  ?  ?  .  %5s %4s %s %4s %5d"
print(line % (recordName, atomIndex, symbol, atom.name, res.name, chainName, resId, resIC, coords[0], coords[1], coords[2],
              resId, res.name, chainName, atom.name, modelIndex), file=file)

Would it be possible to write chainIndex + 1 instead? I.e.:

line = "%s  %5d %-3s %-4s . %-4s %s %d %5s %s %10.4f %10.4f %10.4f  0.0  0.0  ?  ?  ?  ?  ?  .  %5s %4s %s %4s %5d"
print(line % (recordName, atomIndex, symbol, atom.name, res.name, chainName, chainIndex + 1, resId, resIC, coords[0], coords[1], coords[2],
              resId, res.name, chainName, atom.name, modelIndex), file=file)

Note that this is strictly speaking not entirely correct as entity_ids should be deduplicated for the same chains (e.g. an A3-homomer would have 3 chains A, B, C, but just a single entity 1). However, pragmatically this is better than just setting it to being unset (?) and a simple fix to do.

Augustin-Zidek commented 5 months ago

Sorry, submitted the issue in a wrong repository. Resubmitted in the OpenMM one: https://github.com/openmm/openmm/issues/4416