Improved mechanism for handling nonpoly target-template correspondence

benmwebb commented 2 years ago

The dictionary requires that every template be mapped to a target instance (_ma_template_details.target_asym_id). For nonpolymers we currently satisfy this by allowing for an alignment pair without an explicit sequence alignment. This is not ideal because the resulting alignment_id in ma_alignment_info is not linked to the target and template and so is lost when the file is read back in, and the alignment identity and score are not output to the file. It also does not record whether the target nonpolymer was "explicity modeled or implicitly derived from" the template (_pdbx_entity_nonpoly.ma_model_mode).

Instead of using Alignment for nonpolymers, add a NonPolymerFromTemplate class that is a subclass of AsymUnit and acts in the same fashion (can be added to an assembly, etc.) Rather than the constructor taking an entity argument it takes (template, explicit). (For nonpolymers that are modeled without a template, the regular AsymUnit class can be used.) Restore the original behavior of Alignment, i.e. require full sequence alignments.

bienchen commented 2 years ago

Not sure if I understand this correctly, this means I have an AsymUnit like object that takes a Template object instead of Entity? And then the Entity of the nonpoly compound is in the Template object? This is for nonpoly entities implicitly derived from a template? Explicitly modelled nonpoly entities would then use AsymUnit with a Template object instead of Entity?

benmwebb commented 2 years ago

Not sure if I understand this correctly, this means I have an AsymUnit like object that takes a Template object instead of Entity? And then the Entity of the nonpoly compound is in the Template object?

Exactly. NonPolymerFromTemplate(template=foo) would behave much like AsymUnit(entity=foo.entity) except that it also adds the target-template mapping to the output mmCIF (and keeps a Python reference to the template foo so that the dumper can find it).

This is for nonpoly entities implicitly derived from a template? Explicitly modelled nonpoly entities would then use AsymUnit with a Template object instead of Entity?

I'm not sure what "explicitly modeled" means here or in the documentation for pdbx_entity_nonpoly.ma_model_mode. Maybe @brindakv can clarify? Does "explicitly modeled" mean modeled without a template? If so, then you would just make an AsymUnit and point it to the nonpoly Entity - no template, no alignment, no NonPolymerFromTemplate. And then python-modelcif can fill in ma_model_mode = explicit automatically. Or maybe "explicitly modeled" means something else?

At any rate, I will add an example of modeling with ligands once we've resolved this and ihmwg/python-ihm#76.

bienchen commented 2 years ago

Hm, to me, "explicitly modelled" would include the docking scenario. Where I have a protein structure and introduce a new ligand to it via docking. There I have template coordinates of the ligand but no prior knowledge where it will be placed in the protein structure and how much the ligand's template coordinates will be bend/ wiggled around during docking.

benmwebb commented 2 years ago

If I understand you correctly, then "explicit" means you use a template but allow bonds to relax (flexible fitting) while "implicit" means the template is copied as a rigid body (maybe there is some rotation/translation of the entire ligand). Is that what you and @brindakv have in mind? In this case, the NonPolymerFromTemplate constructor would just take a Boolean explicit flag from the user (and when modeling a ligand without a template, pdbx_entity_nonpoly.ma_model_mode would make no sense and would be .).

If I'm reading the dictionary correctly though, ma_model_mode is a per-entity flag. What if my template contains two hemes, and the model also contains two hemes, one modeled explicitly and the other implicitly? There's only one "heme" entity in the file, so what would ma_model_mode be set to?

benmwebb commented 2 years ago

@brindakv, any wisdom on ma_model_mode and implicit/explicit modeling here?

brindakv commented 2 years ago

If I understand you correctly, then "explicit" means you use a template but allow bonds to relax (flexible fitting) while "implicit" means the template is copied as a rigid body (maybe there is some rotation/translation of the entire ligand). Is that what you and @brindakv have in mind?

Yes. I think this is what the definitions imply.

If I'm reading the dictionary correctly though, ma_model_mode is a per-entity flag. What if my template contains two hemes, and the model also contains two hemes, one modeled explicitly and the other implicitly? There's only one "heme" entity in the file, so what would ma_model_mode be set to?

Is that really a possibility? If you have a template for a non-polymeric entity, would you use it for one instance and not for the other?

benmwebb commented 2 years ago

If I'm reading the dictionary correctly though, ma_model_mode is a per-entity flag. What if my template contains two hemes, and the model also contains two hemes, one modeled explicitly and the other implicitly? There's only one "heme" entity in the file, so what would ma_model_mode be set to?

Is that really a possibility? If you have a template for a non-polymeric entity, would you use it for one instance and not for the other?

This is hypothetical (we don't model ligands at all) but in this example I have two templates, and I am using them both. (Sorry, I spoke imprecisely here. I have one template structure PDB file, but two template_ids.)But because they're both the same sequence (heme) there is only one entity. ma_model_mode thus makes more sense to me as a per-nonpoly-template flag, not per-nonpoly-entity.

benmwebb commented 2 years ago

See https://github.com/ihmwg/python-modelcif/blob/main/examples/ligands.py for a worked example.

ihmwg / python-modelcif

Improved mechanism for handling nonpoly target-template correspondence #7