project-gemmi / gemmi

macromolecular crystallography library and utilities
https://project-gemmi.github.io/
Mozilla Public License 2.0
221 stars 45 forks source link

Keep Refmac metadata in mmcif in Structure #60

Closed keitaroyam closed 3 years ago

keitaroyam commented 3 years ago

Opening refmac mmcif by gemmi.read_structure("refmac.mmcif") and saving it by make_mmcif_document().write_file("foo.mmcif") results in loss of some records. Can gemmi keep them? If it is too much trouble I can work on it. Currently, following records are lost.

_atom_site.auth_atom_id
_atom_site.auth_comp_id
_atom_site.calc_flag
_atom_site.group_pdb
_atom_site.pdbx_pdb_atom_name
_atom_site.pdbx_pdb_residue_name
_atom_site.pdbx_pdb_residue_no
_atom_type.n_electrons
_atom_type.scat_cromer_mann_a1
_atom_type.scat_cromer_mann_a2
_atom_type.scat_cromer_mann_a3
_atom_type.scat_cromer_mann_a4
_atom_type.scat_cromer_mann_b1
_atom_type.scat_cromer_mann_b2
_atom_type.scat_cromer_mann_b3
_atom_type.scat_cromer_mann_b4
_atom_type.scat_cromer_mann_c
_atom_type.scat_z
_ccp4_form_factor.scat_method
_ccp4_refine.details
_ccp4_refine_ls.jelly_body
_ccp4_refine_ls.jelly_body_dist
_ccp4_refine_ls.jelly_body_sigma
_entity_poly_seq.hetero
_refine.aniso_b[1][1]
_refine.aniso_b[1][2]
_refine.aniso_b[1][3]
_refine.aniso_b[2][2]
_refine.aniso_b[2][3]
_refine.aniso_b[3][3]
_refine.b_iso_mean
_refine.correlation_coeff_fo_to_fc
_refine.correlation_coeff_fo_to_fc_free
_refine.details
_refine.ls_number_reflns_r_free
_refine.ls_number_reflns_r_work
_refine.ls_percent_reflns_r_free
_refine.ls_r_factor_all
_refine.ls_r_factor_r_free
_refine.ls_r_factor_r_work
_refine.ls_wr_factor_r_free
_refine.ls_wr_factor_r_work
_refine.overall_su_b
_refine.overall_su_ml
_refine.pdbx_average_fsc_free
_refine.pdbx_average_fsc_overall
_refine.pdbx_average_fsc_work
_refine.pdbx_overall_esu_r
_refine.pdbx_overall_esu_r_free
_refine.pdbx_solvent_ion_probe_radii
_refine.pdbx_solvent_shrinkage_radii
_refine.pdbx_solvent_vdw_probe_radii
_refine.solvent_model_details
_refine_ls_restr.dev_ideal
_refine_ls_restr.dev_ideal_target
_refine_ls_restr.number
_refine_ls_restr.pdbx_refine_id
_refine_ls_restr.type
_refine_ls_shell.d_res_high
_refine_ls_shell.d_res_low
_refine_ls_shell.number_reflns_all
_refine_ls_shell.number_reflns_r_free
_refine_ls_shell.number_reflns_r_work
_refine_ls_shell.pdbx_fsc_free
_refine_ls_shell.pdbx_fsc_work
_refine_ls_shell.pdbx_refine_id
_refine_ls_shell.pdbx_total_number_of_bins_used
_refine_ls_shell.percent_reflns_obs
_refine_ls_shell.r_factor_all
_refine_ls_shell.r_factor_r_free
_refine_ls_shell.r_factor_r_work
_refine_ls_shell.wr_factor_r_work
_software.contact_author
_software.contact_author_email
_software.description
wojdyr commented 3 years ago

It's a known problem, but I don't know what would be a good solution. I see two options: 1) decide what is important to have and store it in Structure, 2) when writing cif file, start from the original cif Document and modify only selected categories.

What do you do with the structure between reading and writing?

keitaroyam commented 3 years ago

For the moment I apply translation for all coordinates and modify _struct_ncs_oper. I am happy with the option 2.

wojdyr commented 3 years ago

I added function update_mmcif_block() that should be able to do this, see https://gemmi.readthedocs.io/en/latest/mol.html#id3

In your case it would be:

groups = gemmi.MmcifOutputGroups(False)
groups.ncs = True
groups.atoms = True

I haven't tested it much and I suspect that it doesn't do a sensible thing for all combinations of parameters.

keitaroyam commented 3 years ago

Thanks! I did the following, and it looks almost perfect.

import gemmi
structure = gemmi.read_structure("refmac1.mmcif")
groups = gemmi.MmcifOutputGroups(False)
groups.ncs = True
groups.atoms = True
doc = gemmi.cif.read("refmac1.mmcif")
block = doc.find_block(structure.info["_entry.id"])
structure.update_mmcif_block(block, groups)
doc.write_file('tst.cif')

The only difference is that file following records were lost, I need to check if it is ok.

_atom_site.auth_atom_id
_atom_site.auth_comp_id
_atom_site.calc_flag
_atom_site.group_pdb
_atom_site.pdbx_pdb_atom_name
_atom_site.pdbx_pdb_residue_name
_atom_site.pdbx_pdb_residue_no
wojdyr commented 3 years ago

_atom_site.auth_atom_id _atom_site.auth_comp_id

I skipped these two intentionally. They are always identical with the corresponding _atom_site.label_*_id.

_atom_site.calc_flag

I should add this. It haven't been used by the wwPDB until recently. But I vaguely remember this flag being discussed during mmCIF WG meetings this year. And I see it's already used in 7ATL. So I guess I should adapt gemmi to read and write calc_flag.

_atom_site.group_pdb

This is not particularly useful, because Refmac (and many other programs) assign HETATM differently than the software used by wwPDB. The difference is that the latter uses HETATM also for not natural residues in polymer. But I could add it if needed.

_atom_site.pdbx_pdb_atom_name _atom_site.pdbx_pdb_residue_name _atom_site.pdbx_pdb_residue_no

I think Refmac should stop writing these.

keitaroyam commented 3 years ago

Thank you. I am happy if _atom_site.calc_flag is added (to indicate riding hydrogen or not). I noticed atom_site.pdbx_tls_group_id was not kept. Can you add this?

wojdyr commented 3 years ago

ok, I added both calc_flag and pdbx_tls_group_id

keitaroyam commented 3 years ago

Thanks!

xvlaurent commented 3 years ago

Hello!

I is it possible to also include _atom_site.group_pdb in the properties handled by update_mmcif_block() please? I have some analysis where that information is useful to me.

Thanks by advance !

wojdyr commented 3 years ago

@xvlaurent ok, I added an option for _atom_site.group_pdb

st = ...
groups = gemmi.MmcifOutputGroups(True)
groups.group_pdb = True
doc = gemmi.cif.Document()
block = doc.add_new_block('new')
st.update_mmcif_block(block, groups)
doc.write_file('out.cif')
xvlaurent commented 3 years ago

Thanks for your work !