project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
205 stars 42 forks source link

False positive validation of a cif file with a missing category #285

Closed PeyratG closed 9 months ago

PeyratG commented 9 months ago

Hello,

I've tried to use the cif validator of the gemmi program on a file where I voluntarily deleted the whole atom_site category, while I was using a mmcif ddl dictionary with the following modified content:

   _category.id              atom_site
   _category.mandatory_code  yes

The command I've run is gemmi validate -q --ddl=custom_mmcif_pdbx_v50.dic structure_file.cif

I expected to get an error because the mandatory category is not there, but I get no error.

I wonder if this is a bug or if I'm missing something obvious.

Thanks !

wojdyr commented 9 months ago

_category.mandatory_code is not checked because it's not used (more precisely: it's always no) in official pdbx/mmcif dictionaries. Why would you like to use it?

PeyratG commented 9 months ago

I would like to build a precheck function to ensure mmcif created by users contains minimal information we defined mandatory to accept a structure.

I want to check if the absence of a complete category could raise an error.

From the official pdbx/mmcif documentation about dictionnary content I thought that was the purpose of this field:

Category CATEGORY

The name and textual description of a category are stored in the category named CATEGORY. The item (_category.mandatory_code) indicates if the category must appear in any data block based on this dictionary. 
wojdyr commented 9 months ago

Supposedly, this was the original idea. The PDB developers must have abandoned it, though, perhaps because, without conditionals, it wouldn't fully work anyway. What's currently mandatory in files submitted to the PDB depends on the experimental method and perhaps also on other factors. And the same spec is used to validate all kinds of mmCIF: coordinates, structure factors, CCD. So currently nothing is mandatory (according to the dictionary) and I don't know if it ever was.

If you're not working for the PDB, it might be easier to directly check in code if the required categories are present rather than maintaining a custom dictionary.

PeyratG commented 9 months ago

Thanks for you answers, I'll directly check in the code then 👍