project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
228 stars 45 forks source link

[FEATURE REQUEST] initial data_comp_list section in ChemComp output #269

Open rimmartin opened 1 year ago

rimmartin commented 1 year ago

Hi

We have made a chemcomp style cif via gemmi::make_chemcomp_with_restraints, gemmi::add_chemcomp_to_block and gemmi::cif::write_cif_block_to_stream which restrains a phenix run.

Buster is asking for more to run with our cif

data_comp_list

Busrer error message without the comp_list

[E003] First block in file is comp_CLR not comp_list
CV-GPhL commented 1 year ago

Dear all,

On Wed, Jun 14, 2023 at 08:18:20AM -0700, rimmartin wrote:

Buster is asking for more to run with our cif

data_comp_list

Busrer error message without the comp_list

[E003] First block in file is comp_CLR not comp_list

The mmCIF restraints file are less well defined (compared to PDBx/mmCIF files) and we therefore work on the assumptions that files from the main restraint generators atm (Grade/Grade2, AceDrg, eLBOW) have a similar structure. So far, they all had a data_comp_list datablock at the top - and BUSTER checks for that to ensure the file given is not "something else".

Other refinement programs might have less stringent checks, but I'd highly recommend trying to stick with the structure/format of existing restraint dictionary files as much as possible: we are lucky to (currently) have very good interchangeability between very different refinement programs /even/ allowing for the fact that those files are poorly documented in mmCIF dictionaries. Let's try and avoid any divergence if at all possible - even if allowed by the mmCIF format.

For CLR we have

(1) $CCP4/lib/data/monomers/c/CLR.cif

data_complist loop _chem_comp.id _chem_comp.three_letter_code _chem_comp.name _chem_comp.group _chem_comp.number_atoms_all _chem_comp.number_atoms_nh _chem_comp.desc_level CLR CLR CHOLESTEROL NON-POLYMER 74 28 .
# data_comp_CLR ...

(2) Grade2

data_complist # loop _chem_comp.id
_chem_comp.three_letter_code
_chem_comp.name
_chem_comp.group
_chem_comp.number_atoms_all
_chem_comp.number_atoms_nh
_chem_comp.desc_level
_chem_comp.type
CLR CLR CHOLESTEROL NON-POLYMER 74 28 . NON-POLYMER # data_comp_CLR ...

Cheers

Clemens

wojdyr commented 1 year ago

I don't know what's the minimal content of the comp_list block that will satisfy all programs. Perhaps 2-3 lines would do, for example:

data_comp_list
_chem_comp.id CLR
_chem_comp.group NON-POLYMER

_chem_comp.group is used for finding links that can be automatically applied to the residue (for example, if it's L-peptide, CIS and TRANS links can be applied). For now, I'll leave it up to you to write the first block.

rimmartin commented 1 year ago

If I write minimally:

data_comp_list
_chem_comp.id CLR
_chem_comp.three_letter_code CLR
_chem_comp.name 'Unknown                  '

at the top of the cif, buster will run with it; Grade2 style