Closed gtauriello closed 6 months ago
Funny, when I first saw the ma dictionary one of my questions was "why do we have ma_target_ref_db_details
when we already have struct_ref
"? (I forget @brindakv's answer at the time.)
I think your suggestion could work... but I don't like it very much! References in python-ihm and python-modelcif are not designed to be mixed and matched. You would have to have two sets of objects that contain very similar data, since there's a lot of overlap between the two tables (e.g. UniProt accession). Don't you think we should just have one such object, and use it to populate both tables? This would necessitate a bit of work in python-ihm too, since there are some fields in ma_target_ref_db_details
that aren't in struct_ref
, like seq_db_sequence_version_date
(but perhaps there's another PDBx table where that could go?) Or perhaps we could get rid of ma_target_ref_db_details
entirely?
Ah well. As Brinda had pointed out, this was historically due to wanting a symmetry between ma_template_ref_db_details
and ma_target_ref_db_details
(not super important by now given the popularity on de novo models). We did consider getting rid of ma_target_ref_db_details
entirely (or deprecating it) but decided against it for practical reasons: 1) it's a breaking change which would need to be applied to all ModelCIF files and 2) ma_target_ref_db_details
has 2 useful features missing in struct_ref
: seq. version & checksum and a controlled vocabulary for db_name
. We could add extra ma_...
items to struct_ref
in ModelCIF but we wouldn't be able to modify db_name
and would need to rely on "conventions" instead of a proper definition of the possible DBs.
So in the end we decided to keep ma_target_ref_db_details
and add struct_ref
in ModelCIF files to be compatible with PDB and PDB-Dev. I'll forward you the email thread we had with Brinda about it...
Btw: currently ma_target_ref_db_details
also doesn't allow to have multiple entries pointing to the same DB which is a bad restriction. That will be fixed with https://github.com/ihmwg/ModelCIF/issues/18.
In terms of implementation, I am also happy to have it all in one object instead of duplicating things.
Could modelcif.reference.TargetReference
just inherit from ihm.reference.Sequence
?
It could fill everything from ihm.reference.Sequence
with sequence
set to None (which seems acceptable if I read the _StructRefDumper
code correctly). Then if one wants to fill the rest of ihm.reference.Sequence
, one can do that after construction (which is anyway needed to fill the alignments
list).
In terms of implementation, I am also happy to have it all in one object instead of duplicating things. Could
modelcif.reference.TargetReference
just inherit fromihm.reference.Sequence
? It could fill everything fromihm.reference.Sequence
withsequence
set to None (which seems acceptable if I read the_StructRefDumper
code correctly). Then if one wants to fill the rest ofihm.reference.Sequence
, one can do that after construction (which is anyway needed to fill thealignments
list).
Yes, I think that would work. I'll take a look.
Following some discussions with Brinda, I wanted to generate ModelCIF files which additionally to ma_target_ref_db_details also fill the struct_ref category. I figured, I could just do that by adding
ihm.reference.Reference
objects to thereferences
list inmodelcif.Entity
(which is identical toihm.Entity
), but that didn't work.I think that for this to work, one would only need to include
_StructRefDumper
to themodelcif.dumper
and restrict_StructRefDumper
to only process subclasses ofihm.reference.Reference
and_TargetRefDBDumper
to only process subclasses ofmodelcif.reference.TargetReference
. So this should be a fairly minor change which would enable to easily add both types of references.Do you think this would work?