Closed CV-GPhL closed 2 years ago
Yes, the negative sigmas are not a good way of marking misfits, and they are formally incorrect (according to the mmCIF spec), but also there is no good way to include such reflections. It's not even planned yet, as discussed: https://groups.google.com/a/wwpdb.org/g/mmcif-data-proc/c/QFXYyFXa1C8
I'll think about removing such reflections. But generally the conversion, with exception of the status column, is done in a generic way according to the spec. So for this I'd probably need to add special handling of MTZ column type Q.
Agree: that is all a bit of a messy situation. Maybe the spec file could contain some syntax to define rejection - something like
intensity_sigma SIGI Q 0 <=0
where the last item defines that a whole record is dropped if that value is <= 0. This way gemmi does an as-is conversion as before but allows user control (at the moment I do that kind of selection within SFTOOLS after gemmi cif2mtz
- which works fine, but is a separate step).
Having conditions in spec file would be an overkill, unless there are other uses for such filtering.
I added option --skip-negative-sigma
that filters out any reflections with a negative value in (any) Q type column.
Remember that we also have column types L (standard deviation of F+ or F-) and M (standard deviation of I+ or I-) that could/should be handled here. However: should we remove a full record if e.g. sigma of F+ is negative (while F- sigma is positive)?
Aren't misfits marked in unmerged data only, in type Q column?
Yes - but if you provide the flag --skip-negative-sigma
I would expect it to work on any sigmas independent of the type of input (merged/unmerged). It might be very unlikely to nave negative sigmas in L and M column types (which are usually associated with merged data), so maybe not a real issue.
However: what about sigma values that are zero? This also makes no sense and indeed in old reflection files it was used to mark a MNF. E.g. in a set of our weekly runs we have those PDB entries with some sigmas being <= 0:
WARNING: [3bxq] 527 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [3bxq] 206 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [3ckh.2] 358 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [3ckh.2] 111 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7ejb] 10 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7ejb] 5 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7ejy] 3 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7ejy] 5 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7eku.2] 1744 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7eku.2] 4237 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7f86] 2 reflections in columns FP/SIGFP have been set to 'absent' because sigma value (SIGFP) was zero, below zero or a MNF
WARNING: [7f86] 323 reflections in columns F(+)/SIGF(+) have been set to 'absent' because sigma value (SIGF(+)) was zero, below zero or a MNF
WARNING: [7f86] 272 reflections in columns F(-)/SIGF(-) have been set to 'absent' because sigma value (SIGF(-)) was zero, below zero or a MNF
WARNING: [7f86] 321 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7f86] 272 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7n3r] 134 reflections in columns FP/SIGFP have been set to 'absent' because sigma value (SIGFP) was zero, below zero or a MNF
WARNING: [7n3r] 80 reflections in columns F(+)/SIGF(+) have been set to 'absent' because sigma value (SIGF(+)) was zero, below zero or a MNF
WARNING: [7n3r] 85 reflections in columns F(-)/SIGF(-) have been set to 'absent' because sigma value (SIGF(-)) was zero, below zero or a MNF
WARNING: [7n3r] 57 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7n3r] 283 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7n3s] 30 reflections in columns FP/SIGFP have been set to 'absent' because sigma value (SIGFP) was zero, below zero or a MNF
WARNING: [7n3s] 16 reflections in columns F(+)/SIGF(+) have been set to 'absent' because sigma value (SIGF(+)) was zero, below zero or a MNF
WARNING: [7n3s] 22 reflections in columns F(-)/SIGF(-) have been set to 'absent' because sigma value (SIGF(-)) was zero, below zero or a MNF
WARNING: [7n3s] 22 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7n3s] 6 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7n3v] 674 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7n3v] 717 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7o1e] 159 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7o1e] 117 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7o1f] 194 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7o1f] 219 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7o2o.4] 4 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7o2o.4] 17 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7o35] 37 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7o35] 43 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7o36] 405 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7o36] 272 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7o3k] 19 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7o3k] 7 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7o45] 1393 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7o45] 1307 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7p7f.2] 1331 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7p7f.2] 1368 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7p7g.2] 3 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7p7h.2] 340 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7p7h.2] 3024 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7ppz] 629 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7ppz] 436 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7q4i] 898 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7q4i] 806 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7qjg.5] 130 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7qjg.5] 84 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7qju.5] 221 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7qju.5] 132 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7qk4.5] 156 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7qk4.5] 508 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7rzb] 238 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7rzb] 566 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7shw] 824 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7shw] 844 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7ski] 1520 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7ski] 3829 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7skk] 699 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7skk] 686 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7twa] 46 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7twa] 172 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7vnh] 6 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7vnh] 1 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7w5s] 396 reflections in columns FP/SIGFP have been set to 'absent' because sigma value (SIGFP) was zero, below zero or a MNF
WARNING: [7w5t] 60 reflections in columns FP/SIGFP have been set to 'absent' because sigma value (SIGFP) was zero, below zero or a MNF
WARNING: [7w5v] 77 reflections in columns FP/SIGFP have been set to 'absent' because sigma value (SIGFP) was zero, below zero or a MNF
WARNING: [7wo1] 1 reflections in columns I/SIGI have been set to 'absent' because sigma value (SIGI) was zero, below zero or a MNF
WARNING: [7yzu] 604 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7yzu] 303 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
WARNING: [7z79] 56 reflections in columns I(+)/SIGI(+) have been set to 'absent' because sigma value (SIGI(+)) was zero, below zero or a MNF
WARNING: [7z79] 74 reflections in columns I(-)/SIGI(-) have been set to 'absent' because sigma value (SIGI(-)) was zero, below zero or a MNF
(this uses our own, pre-gemmi tool to handle mmCIF reflection data and convert into MTZ, so YMMV). It doesn't skip whole reflections, but sets the sigma and associated amplitude/intensity to MNF instead. Is that something gemmi convert already does anyway?
It's documented that only Q columns are taken into account:
--skip-negative-sigma Skip reflections with sigma<0 (in any Q column).
Special handling of sigma=0 would be more tricky, because 0.00 in cif file could mean 0.004.
Hi Marcin,
On Wed, Apr 20, 2022 at 06:33:24AM -0700, Marcin Wojdyr wrote:
It's documented that only Q columns are taken into account:
--skip-negative-sigma Skip reflections with sigma<0 (in any Q column).
Correct - maybe provide some additional info/link (since the concept of a column type doesn't really exist at the point of a mmCIF file). I guess "--print-sepc" provides the information about mmCIF item and MTZ column type.
Special handling of sigma=0 would be more tricky, because 0.00 in cif file could mean 0.004.
Not quite I think: 0.00 in CIF file means exactly that. Yes, it might have come from an initial value of 0.004 or -0.004 (which is why we write CIF files in autoPROC in scientific notation), but at the point of cif2mtz you don't have that information. You can of course stick with such a zero value and rely on all subsequent programs (reading the resulting MTZ) of handling the data accordingly: most "old" programs will have checks not just for MNF but also sigma==0. But that is not guaranteed: newer programs might assume that the MTZ file given to them has never gone through a conversion (into CIF format and back) and that items either have a value or a MNF.
I would be inclined to at least write a warning message about the number of sigmas (Q, M and L) that are 0 if you stick with the current behaviiour - since that could point to a poor formatting (precision) during the CIF file generation and could alert developers or databases to potentially check their procedure. It would be even better if there was an option that would tell cif2mtz to set all data items (i.e. pairs where one is a sigma) to MNF if the sigma item is 0.
Cheers
Clemens
OK, I added "MTZ" in the text to make clear that Q is MTZ column type:
--skip-negative-sigma Skip reflections with sigma<0 (in any MTZ Q column).
And I added option:
--zero-to-mnf If value and sigma are 0, set both to MNF.
It checks if both sigma and the corresponding value are zero, because replacing non-zero value with MNF feels wrong.
The following PDB structures (
<id>_<block>
) contain unmerged data with some sigmas given as negative value:This is most likely because e.g. XDS_ASCII.HKL data was converted to mmCIF as-is (without excluding those reflections where the negative sigma marked them as a misfit). Of course, the most likely correct fix is to exclude those reflections from getting into the mmCIF file in the first place - after all, for data coming e.g. out of AIMLESS we would never include those ROGUES reflections: they are not included into output file but in a separate file, while XDS using the same single file for both types of reflections (valid and to be kept versus misfit and to be rejected) is unfortunate in that respect.
Now, when it comes to going back from mmCIF to MTZ I would suggest rejecting those negative sigma reflections from conversion. That seems to me the most natural/sensible approach here. Alternatively (and maybe controlled via some flag) one could also flip their sign or keep the current behaviour.
?