pdbxmmcifwg / diffrn-data-set-extension

PDBx mmCIF dictionary extension for diffraction data sets
7 stars 4 forks source link

Allowed values for _pdbx_diffrn_data_section_contents.content_type #1

Open drlemmus opened 4 years ago

drlemmus commented 4 years ago

The allowed values for _pdbx_diffrn_data_section_contents.content_type have several values that seem to be duplicates. 'X-ray 2FOFC map coefficients' and 'X-ray 2FO-FC map coeff' seem to be the same thing and so do 'X-ray structure factor intensities, unmerged' and 'X-ray unmerged intensities'. If there are subltle differences, these should be made very clear, if not only one value should remain. The value 'X-ray FOM, X-ray batch flag from mtz' is a bit weird as FOM and batch flag are independent values.

The set also seems to be rather incomplete for neutron and electron scattering and for X-ray data there are still quite a few values (sigmas for instance) that can occur on a data section that are not listed.

Since the experiment type for a data_section_id is already described in _pdbx_diffrn_data_section.id as is the status of the data with respect to being merged, it seems superfluous to encode these in _pdbx_diffrn_data_section_contents.content_type values. Not doing so can solve some of the issues above and also keep the list of possible values reasonably small.

epeisach commented 4 years ago

When first written, this list was not intended to be comprehensive.

a) The 'X-ray 2FOFC map coefficients' are clearly redundant and will be removed. b) FOM, X-ray batch: You are correct - they do not make sense. Should be removed c) With regards to sigmas - presumably you would not have a data block of inky sigmas without F's or I's. You should think of this content_type to describe the type of data in such a block - but you would need to examine the columns present to know if this was present. d) pdbx_diffrn_data_section_contents is designed to be a table of contents. Imagine if you have a joint X-ray/Neutron - and the contents said "map coefficients" - you would not know which was X-ray and which was Neutron.

One possibility is to include in pdbx_diffrn_data_section a mandatory scattering type. Then your content type enumeration is reduced - and you still can find out what is in the file without having to parse each data block.

I would enjoy other's thoughts on this.

drlemmus commented 4 years ago

I think adding the scattering type would be a good solution.

CV-GPhL commented 4 years ago

Some comments: a) maybe using the fully spelt-out "coefficients" instead of "coeff" would be nicer? We don't need to save a few characters when it means potential confusion, right? b) this looks like a missing LF/CR, i.e. these are two different items ("X-Ray FOM" and "X-ray batch flag from mtz"). "X-Ray FOM" should be kept I think. "X-ray batch flag from mtz" could be kept (since it has a mostly historic meaning in MTZ format content), but I'm not sure it helps anything with modern data collections (since we have a 1:1 correspondence between BATCH and image number).

CV-GPhL commented 4 years ago

Additional requests:

(1) Use consistently plural:

(2) Add item for merged intensities:

(3) Use consistent conventions to mark data as "unmerged":

(4) Add items for anomalous differences:

(5) Clarify the distinction between "merged" and "unmerged" data in the definition. Something like

The value of _pdbx_diffrn_data_section_contents.content_type describes the type of reflection data a data section in a diffraction data file holds. Multiple types can be associated to a given data section. Data sections are supposed to contain merged reflection data (i.e. reduced to a reciprocal space asymmetric unit) by default - unless explicitly described as being "unmerged"

pkeller commented 4 years ago

What is more, the current definition is plain wrong:

The value of _pdbx_diffrn_data_section_contents.content_type uniquely identifies
  a data section in a diffraction data file.

This is the description of _pdbx_diffrn_data_section_contents.data_section_id with the name of the item changed.