observed criterion - Githubissues

CV-GPhL commented 4 years ago

I find the whole concept of "observed criterion" confusing and unnecessary. What are we trying to describe here? When is the last time anyone in MX applied some individual, per-reflection criteria a la

_pdbx_diffrn_merge_stat.observed_criterion_I_max (The criterion used to classify a reflection as 'observed' expressed as an upper limit for the value of I.)
_pdbx_diffrn_merge_stat.observed_criterion_I_min (The criterion used to classify a reflection as 'observed' expressed as a lower limit for the value of I.)
_pdbx_diffrn_merge_stat.observed_criterion_sigma_I (The criterion used to classify a reflection as 'observed' expressed as a multiple of the value of sigma(I).)

The only place where some programs/users might apply such a criteria could be during refinement itself (or maybe in computing some R-values):.

Unfortunately, the "observed_criterion" is defined in the _reflns category, suggesting that any selection process was done within the reflection data prior to refinement. Of course, multiple selection processes happened during data-collection, but none as simplistic as an I>sig(I) cut-off or a Imax/Imin limit. These selection processes are highly program and processing-step specific (as well as depending on program versions and run-time parameters) and can be very difficult to capture in anything machine readable.

Of course, we could just define those criteria in a way that would catch all reflections in any case. But this would mean that

_pdbx_diffrn_merge_stat.number_obs
_pdbx_diffrn_merge_stat.number_all would always have the same value ... not sure that is useful here? I would leave any distinction between the _all and _obs quantities out completely and just stick with reflections contributing to this data set.

epeisach commented 4 years ago

Imax/Imin limits - perhaps the intent was to handle overloads. Optional criteria and not required.

We all have beam stop shadows, etc.

Do we need a more general criteria for observed_criterion?

CV-GPhL commented 4 years ago

Overloads are a feature of detector pixels and not reflection intensities. My feeling is that this is a left-over from technical limitations a long time ago: in the early days we didn't have sigmas attached to a reflection (and neither did we have MNF a la CCP4/MTZ), so one could define a reflection as being unobserved by setting I=0 (for example). Or because weak reflections were very unreliable (and no sigma attached to them) when visually reading off film with a greyscale table, a Imin cut-off was a way of using only reliable/strong reflections. Or to speed up computing given what was available then. Or ...

Anyway, it doesn't help to guess what these things might have meant at some point in the past - where we dealt with merged data only at this point - if we are not really using such a selection process on merged data in current software (for at least 25+ years I guess).

As to expanding the observation_criterion (with a fixed set of allowed values/items?) in the context of unmerged data: there are dozens of different ways of selecting/rejecting specific measurements throughout data-processing up to the merging step and hardly any software package is reporting every single rejection/selection in such a detail that one could reliably archive this information ... probably for good reasons.

wwpdb-dictionaries / mmcif_pdbx

observed criterion #21