Issues in validation and make-mmcif

gtauriello commented 1 year ago

I did some more tests for the combination of make-mmcif.py and validate_mmcif.py and observed the following issues:

When testing on 1crn.cif, I observe that the validation gives me errors such as: The following keywords are not defined in the dictionary: _entity_src_gen.entity_id, .... I get similar errors even on some ModelCIF files which were generated with python-modelcif or if I adapt the validation-script to use the standard mmCIF dictionary and running it for 1crn.cif.
When testing conversions of PDB files (models taken from SWISS-MODEL) to mmCIF with MAXIT (v11.100) and then from there to ModelCIF with make-mmcif (example PDB and maxit-generated-mmCIF files attached as SM_models.zip), I see:
- An exception in make-mmcif for the version with ligands (model_01_fix_maxit.cif): AttributeError: 'NoneType' object has no attribute 'lower'
- An exception in validate_mmcif (after running make-mmcif) for the version without ligands (model_01_fix_nolig_maxit.cif): Mandatory keyword ma_data.name cannot have value '?'

benmwebb commented 1 year ago

When testing on 1crn.cif, I observe that the validation gives me errors such as: The following keywords are not defined in the dictionary: _entity_src_gen.entity_id, ....

python-ihm's dictionary parser was expecting data items to be defined after their category, but that's not the case for entity_src_gen. See ihmwg/python-ihm#107.

An exception in make-mmcif for the version with ligands (model_01_fix_maxit.cif): AttributeError: 'NoneType' object has no attribute 'lower'

This is because that file has pdbx_entity_nonpoly and python-modelcif expects it to contain a mandatory ma_model_mode data item. But we can handle that. See #31.

An exception in validate_mmcif (after running make-mmcif) for the version without ligands (model_01_fix_nolig_maxit.cif): Mandatory keyword ma_data.name cannot have value '?'

I'd argue this is not a bug. ma_data.name is indeed mandatory in ModelCIF, and python-modelcif makes a best effort to fill it in, in this case, using entity.pdbx_description. But that field is ? in your input file, so it can't manufacture data. It could replace it with . or something generic like "target sequence" to make the validator happy, but really what the validator is saying here is "hey user, you need to put in a name here".

gtauriello commented 1 year ago

Thanks very much for the fixes.

On the last one: I see your point but then we need to make sure to communicate clearly that we cannot guarantee that a mmCIF-compliant file (like the MAXIT-generated one used in this example) can be converted to a ModelCIF-compliant output with make-mmcif (e.g. as a remark in that file or warning produced with the output). I think the example used here (i.e. a plain PDB file with no extra info on the entities) is fairly representative for what we commonly see in the structure prediction field and so some guidelines on how to fix that would be helpful.

ihmwg / python-modelcif

Issues in validation and make-mmcif #30