ihmwg / python-ihm

Python package for handling IHM mmCIF and BinaryCIF files
MIT License
14 stars 7 forks source link

python-ihm fails on the ensemble without ihm_ensemble_info.model_group_id #105

Closed aozalevsky closed 1 year ago

aozalevsky commented 1 year ago

Here is an example of the PDBDEV_00000018:

with open('PDBDEV_00000018.cif') as fh:
    mmcif,= ihm.reader.read(fh, model_class=ihm.model.Model)
print('num_models:', mmcif.ensembles[0].num_models)
print('num_models_deposited', mmcif.ensembles[0].num_models_deposited)

num_models: 1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [74], in <cell line: 4>()
      2     mmcif,= ihm.reader.read(fh, model_class=ihm.model.Model)
      3 print('num_models:', mmcif.ensembles[0].num_models)
----> 4 print('num_models_deposited', mmcif.ensembles[0].num_models_deposited)

File ~/miniconda3/envs/struct/lib/python3.10/site-packages/ihm/model.py:243, in Ensemble.<lambda>(self)
    239     #: All subsamples that make up this ensemble (if applicable),
    240     #: as :class:`Subsample` objects
    241     self.subsamples = []
--> 243 num_models_deposited = property(lambda self: len(self.model_group),
    244                                 doc="Number of models in this ensemble "
    245                                     "that are in the mmCIF file")
    247 clustering_method = _text_choice_property(
    248     "clustering_method",
    249     ["Hierarchical", "Other", "Partitioning (k-means)"],
    250     doc="The clustering method used to obtain the ensemble, if applicable")
    252 clustering_feature = _text_choice_property(
    253     "clustering_feature", ["RMSD", "dRMSD", "other"],
    254     doc="The feature used for clustering the models, if applicable")

TypeError: object of type 'NoneType' has no len()

which has a minimalistic definition:

#
loop_
_ihm_ensemble_info.ensemble_id
_ihm_ensemble_info.num_ensemble_models
_ihm_ensemble_info.num_ensemble_models_deposited
1 1 1

since _ihm_ensemble_info.model_group_id isn't required by the dictionary, python-ihm should be able to use _ihm_ensemble_info.num_ensemble_models_deposited and, ideally, raise exception if those numbers are inconsistent.

benmwebb commented 1 year ago

Personally, I think it's a little odd to create an mmCIF file this way - if you have no linked model group, there's no way programatically to figure out which are the deposited models in that ensemble. But of course we have no control over what depositors create. We can certainly read num_ensemble_models_deposited from the input file and use that if the model group is not provided.

raise exception if those numbers are inconsistent.

That is asking for trouble since the dictionary does not require any consistency here. The policy in python-ihm has always been to read a compliant file without raising exceptions, and to make a best effort to do the same for noncompliant files. If you want to check a file for issues, run it through a validator. If there are issues with the file but it is still compliant, this should be addressed in the dictionary or by the archive team.