ihmwg / python-ihm

Python package for handling IHM mmCIF and BinaryCIF files
MIT License
14 stars 7 forks source link

make-mmcif.py overwrites assembly name and description #92

Closed benmwebb closed 1 year ago

benmwebb commented 1 year ago

@brindakv reports that util/make-mmcif.py overwrites the user-provided name and description of an assembly.

benmwebb commented 1 year ago

Can confirm with a slightly modified copy of PDB-Dev 25:

% grep -A 4 ihm_struct_assembly.description pol_ii_g.cif    
_ihm_struct_assembly.description
1 'Input complete assembly'
;Integrative structure of the Pol II(G), modeled by IMP, using spatial restraints 
 derived from chemical crosslinking and mass spectrometry data.
;
% python3 make-mmcif.py pol_ii_g.cif
% grep -A 6 ihm_struct_assembly.id output.cif           
_ihm_struct_assembly.id
_ihm_struct_assembly.name
_ihm_struct_assembly.description
1 'Complete assembly'
;All known components & Integrative structure of the Pol II(G), modeled by IMP, using spatial restraints 
 derived from chemical crosslinking and mass spectrometry data.
;

This is because python-ihm creates a default "complete assembly". On output, identical assemblies are merged. Since the user-provided assembly is also a complete assembly, it is merged with default. Descriptions are combined with & but the merged assembly gets the name of the first assembly, the default one rather than the user-provided one. (@brindakv, if you have an example where the user-provided description is discarded rather than merged, let me know.)

Fix this by modifying the merge-assemblies logic to only use the default-complete-assembly name and description if there is no equivalent information in a user-provided assembly.

brindakv commented 1 year ago

@benmwebb thanks for fixing this. All examples I have merged description.