Closed benmwebb closed 3 weeks ago
Note though that IDMapper(None
is a superset of all lost classes. There's a bunch of classes in there that aren't lost even without an orphan list. For example ihm.source.Synthetic
objects don't have an orphan list but won't be lost, because the _pdbx_entity_src_syn
table has to contain an entity_id
. Thus on read, an Entity
object is created which keeps a reference to the Synthetic
object. Since Entity
objects are tracked, we don't need to keep an orphan list for Synthetic
. So we should check all such potential-orphan tables to see if they also contain an ID of a tracked object.
I think with careful construction a user could create an input file that results in the following unused objects:
ihm.ChemComp
from the _chem_comp
tableihm.reference.Sequence
from _struct_ref_seq
ihm.reference.Alignment
from _struct_ref_seq_dif
ihm.geometry.Center
from _ihm_geometric_object_center
ihm.AsymUnitRange
from _ihm_entity_poly_segment
ihm.location.Repository
from _ihm_external_reference_info
ihm.multi_state_scheme.RelaxationTime
from _ihm_relaxation_time
We should add a test that reads an mmCIF file with each of these tables, writes a new file, and asserts that the new file contains all the same tables.
It would be very difficult to completely preserve the _struct_ref_seq
, _struct_ref_seq_dif
and _ihm_entity_poly_segment
tables since we kind of rely on Entity
objects being available to instantiate ihm.reference.Sequence
, ihm.reference.Alignment
and ihm.AsymUnitRange
objects. But files lacking entity tables are probably unusual. Thus, closing this for now.
If the user provides an input file to
make_mmcif
containing one or more unused objects - i.e. a table row with an ID that is not used anywhere else, such as anihm_geometric_object_transformation
that is not used by any geometric object - we should preserve this on output, as the archive folks rely on this behavior in their pipeline. python-ihm requires that all Python objects are ultimately referenced by the top-levelSystem
object. Generally we deal with unused objects by keeping a reference to them in an "orphan" list in theSystem
object. But not all objects have orphan lists and so will be lost if they are unused.To see all potentially lost classes, see all instantiations of the
IDMapper
class (or subclasses) inreader.py
that haveNone
as the first argument. We should either add an orphan list for each such class, or have some sort of catch-all list (although that would complicate output of those objects by the dumpers). Either way, the list should probably be not part of the API for now as there is little reason to create such objects outside of the "preserve an existing file" behavior.