ihmwg / python-ihm

Python package for handling IHM mmCIF and BinaryCIF files
MIT License
14 stars 7 forks source link

make-mmcif.py sets linker type to "none" #91

Closed benmwebb closed 1 year ago

benmwebb commented 1 year ago

@brindakv reports that util/make-mmcif.py does not correctly read _ihm_cross_link_list.linker_type, such that it is set to "none" on output.

benmwebb commented 1 year ago

Looks OK with my testing using PDB-Dev 25 as input:

% grep -A 14 _ihm_cross_link_list.id pol_ii_g.cif           
_ihm_cross_link_list.id
_ihm_cross_link_list.group_id
_ihm_cross_link_list.entity_description_1
_ihm_cross_link_list.entity_id_1
_ihm_cross_link_list.seq_id_1
_ihm_cross_link_list.comp_id_1
_ihm_cross_link_list.entity_description_2
_ihm_cross_link_list.entity_id_2
_ihm_cross_link_list.seq_id_2
_ihm_cross_link_list.comp_id_2
_ihm_cross_link_list.linker_type
_ihm_cross_link_list.dataset_list_id
1 1 GDOWN1 13 314 LYS GDOWN1 13 228 LYS DSS 3
2 2 GDOWN1 13 216 LYS GDOWN1 13 240 LYS DSS 3
3 3 GDOWN1 13 227 LYS RPB10 10 67 LYS DSS 3
% python3 make-mmcif.py pol_ii_g.cif
% grep -A 16 _ihm_cross_link_list.id output.cif             
_ihm_cross_link_list.id
_ihm_cross_link_list.group_id
_ihm_cross_link_list.entity_description_1
_ihm_cross_link_list.entity_id_1
_ihm_cross_link_list.seq_id_1
_ihm_cross_link_list.comp_id_1
_ihm_cross_link_list.entity_description_2
_ihm_cross_link_list.entity_id_2
_ihm_cross_link_list.seq_id_2
_ihm_cross_link_list.comp_id_2
_ihm_cross_link_list.linker_chem_comp_descriptor_id
_ihm_cross_link_list.linker_type
_ihm_cross_link_list.dataset_list_id
_ihm_cross_link_list.details
1 1 GDOWN1 13 314 LYS GDOWN1 13 228 LYS 1 DSS 3 .
2 2 GDOWN1 13 216 LYS GDOWN1 13 240 LYS 1 DSS 3 .
3 3 GDOWN1 13 227 LYS RPB10 10 67 LYS 1 DSS 3 .

python-ihm uses ChemDescriptor objects internally for crosslinkers, but if linker_chem_comp_descriptor_id isn't provided in the input, the reader has logic to map a name in linker_type to an existing descriptor it knows about (e.g. DSS is mapped to ihm.cross_linkers.dss) or, if the name is unknown, to create a new descriptor with blank fields.

@brindakv, if you can provide me with an mmCIF input where this doesn't work, I'll make sure it gets fixed. Perhaps a user populated both linker_chem_comp_descriptor_id and linker_type, but the descriptor ID is invalid or there is no information in the _ihm_chemical_component_descriptor table? If both fields are provided, python-ihm uses the descriptor rather than the linker_type.

brindakv commented 1 year ago

@benmwebb sent you the file through slack.

benmwebb commented 1 year ago

The linker type comes out as none because that's what the user requested:

loop_
_ihm_chemical_component_descriptor.id
_ihm_chemical_component_descriptor.auth_name
_ihm_chemical_component_descriptor.chemical_name
_ihm_chemical_component_descriptor.common_name
_ihm_chemical_component_descriptor.smiles
_ihm_chemical_component_descriptor.smiles_canonical
_ihm_chemical_component_descriptor.inchi
_ihm_chemical_component_descriptor.inchi_key
1 . . . . . . .
2 . . . . . . .

It's not clear to me what they're trying to describe here because some of their crosslinks use descriptor 1 and some descriptor 2, but all have linker_type=EDC.

We certainly could add a heuristic to the reader where linker_type if specified is used to fill in an empty _ihm_chemical_component_descriptor.auth_name. That will give slightly better results in this case but you'll still end up with two linker chemistries both called EDC, which isn't ideal.