salilab / IHMValidation

Validation software for integrative models deposited to PDB
MIT License
2 stars 2 forks source link

Parsing of PDBDEV_00000013 fails #56

Closed aozalevsky closed 1 year ago

aozalevsky commented 2 years ago

Full trace:


---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_2655681/2431891216.py in <module>
      1 fname = '/home/domain/data/silwer/pdb_dev/IHMValidation_aozalevsky/example/PDBDEV_00000013.cif'
      2 with open(fname, encoding='utf8') as f:
----> 3     m, = ihm.reader.read(f, model_class=ihm.model.Model)

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in read(fh, model_class, format, handlers, warn_unknown_category, warn_unknown_keyword, read_starting_model_coord, starting_model_class, reject_old_file, variant)
   3296             ukhandler.add_category_handlers(hs)
   3297         r.category_handler = dict((h.category, h) for h in hs)
-> 3298         more_data = r.read_file()
   3299         for h in hs:
   3300             h.finalize()

/usr/local/lib/python3.8/dist-packages/ihm/format.py in read_file(self)
    587 
    588            :exc:`CifParserError` will be raised if the file cannot be parsed.
--> 589 
    590            :return: True iff more data blocks are available to be read.
    591         """

/usr/local/lib/python3.8/dist-packages/ihm/format.py in _read_file_c(self)
    638         if self.unknown_category_handler is not None:
    639             _format.add_unknown_category_handler(self._c_format,
--> 640                                                  self.unknown_category_handler)
    641         if self.unknown_keyword_handler is not None:
    642             _format.add_unknown_keyword_handler(self._c_format,

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in __call__(self, starting_model_id, asym_id, entity_poly_segment_id, dataset_list_id, starting_model_auth_asym_id, starting_model_sequence_offset, description)
   1500                  starting_model_sequence_offset, description):
   1501         m = self.sysr.starting_models.get_by_id(starting_model_id)
-> 1502         asym = self.sysr.ranges.get(
   1503             self.sysr.asym_units.get_by_id(asym_id), entity_poly_segment_id)
   1504         m.asym_unit = asym

/usr/local/lib/python3.8/dist-packages/ihm/reader.py in get(self, asym_or_entity, range_id)
    190             return asym_or_entity
    191         else:
--> 192             return asym_or_entity(*self._id_map[range_id])
    193 
    194 

KeyError: '1'

I narrowed down the issue to the order of two sections. The code fails on

 1409 loop_                                                                                                                                                                                                
 1410 _ihm_starting_model_details.starting_model_id                                                                                                                                                        
 1411 _ihm_starting_model_details.entity_id                                                                                                                                                                
 1412 _ihm_starting_model_details.entity_description                                                                                                                                                       
 1413 _ihm_starting_model_details.asym_id                                                                                                                                                                  
 1414 _ihm_starting_model_details.entity_poly_segment_id                                                                                                                                                   
 1415 _ihm_starting_model_details.starting_model_source                                                                                                                                                    
 1416 _ihm_starting_model_details.starting_model_auth_asym_id                                                                                                                                              
 1417 _ihm_starting_model_details.starting_model_sequence_offset                                                                                                                                           
 1418 _ihm_starting_model_details.dataset_list_id                                                                                                                                                          
 1419     1  1  CYP199A2    A    1   'experimental model'  A  -13  1                                                                                                                                       
 1420     2  2  HaPux       B    2   'experimental model'  A    0  2   

because actual _ihm_entity_poly_segment records are defined ~40 lines below

 1455 loop_                                                                                                                                                                                                
 1456 _ihm_entity_poly_segment.id                                                                                                                                                                          
 1457 _ihm_entity_poly_segment.entity_id                                                                                                                                                                   
 1458 _ihm_entity_poly_segment.seq_id_begin                                                                                                                                                                
 1459 _ihm_entity_poly_segment.seq_id_end                                                                                                                                                                  
 1460 _ihm_entity_poly_segment.comp_id_begin                                                                                                                                                               
 1461 _ihm_entity_poly_segment.comp_id_end                                                                                                                                                                 
 1462 1 1 1 399 SER ALA                                                                                                                                                                                    
 1463 2 2 1 106 PRO THR     

If I swap them with each other parsing continues. Indeed, according to the scheme _ihm_entity_poly_segment table should go first. @benmwebb can you check my analysis?

benmwebb commented 2 years ago

I don't believe the order of categories is mandated anywhere but @brindakv would know for sure. At any rate, python-ihm's reader is supposed to work regardless of the order in which it encounters categories - we even have explicit tests for that in many cases such as https://github.com/ihmwg/python-ihm/blob/0.32/test/test_reader.py#L336-L381. So if it doesn't, that's a bug. I would suggest you reorder the file for now, and I can fix this when I'm back from vacation.

brindakv commented 2 years ago

The order of categories should not matter.