xia2 / xia2

An expert system for automated reduction of X-ray diffraction data from macromolecular crystals
https://xia2.github.io/
BSD 3-Clause "New" or "Revised" License
18 stars 14 forks source link

Merging stats in xia2.txt do not agree with aimless log #58

Closed graeme-winter closed 7 years ago

graeme-winter commented 7 years ago

Report from Dave Lawson

High resolution limit                             1.60    4.34    1.60
Low resolution limit                             54.38   54.41    1.63
Completeness                                     98.0    98.9    96.4
Multiplicity                                      6.7     6.4     6.9
I/sigma                                           7.2    21.5     1.2
Rmerge(I)                                       0.178   0.073   2.090
Rmerge(I+/-)                                    0.157   0.065   1.874
Rmeas(I)                                        0.193   0.079   2.262
Rmeas(I+/-)                                     0.187   0.078   2.217
Rpim(I)                                         0.075   0.031   0.858
Rpim(I+/-)                                      0.100   0.042   1.174
CC half                                         0.994   0.993   0.463
Wilson B factor                                 0.000
Anomalous completeness                           96.5    97.4     4.8
Anomalous multiplicity                            3.4     3.3     3.5
Anomalous correlation                            0.174   0.029   0.185
Anomalous slope                                 1.014   0.000   0.000
dF/F                                            0.143
dI/s(dI)                                        1.014
Total observations                              622457  30788   31781
Total unique                                    92990   4783    4598
Assuming spacegroup: P 1 21 1

from xia2,


Low resolution limit                       54.38     54.38      1.64
High resolution limit                       1.60      7.16      1.60

Rmerge  (within I+/I-)                     0.158     0.058     1.579
Rmerge  (all I+ and I-)                    0.179     0.064     1.730
Rmeas (within I+/I-)                       0.187     0.070     1.865
Rmeas (all I+ & I-)                        0.194     0.070     1.871
Rpim (within I+/I-)                        0.101     0.037     0.985
Rpim (all I+ & I-)                         0.075     0.028     0.708
Rmerge in top intensity bin                0.067        -         -
Total number of observations              622457      6867     47046
Total number unique                        92990      1092      6767
Mean((I)/sd(I))                              7.2      22.4       1.2
Mn(I) half-set correlation CC(1/2)         0.994     0.991     0.507
Completeness                                98.1      98.4      96.8
Multiplicity                                 6.7       6.3       7.0

Anomalous completeness                      96.5      96.8      95.9
Anomalous multiplicity                       3.3       3.4       3.5
DelAnom correlation between half-sets      0.176    -0.019     0.104
Mid-Slope of Anom Normal Probability       1.014       -         -

from aimless - I have sympathy for the point of view which says these do not agree...

@rjgildea you have the data location in your inbox

graeme-winter commented 7 years ago

Second example

xia2

High resolution limit                             3.45    9.36    3.45
Low resolution limit                            138.59  138.70    3.51
Completeness                                    100.0   100.0   100.0
Multiplicity                                     24.7    21.9    20.7
I/sigma                                           7.1    19.9     2.3
Rmerge(I)                                       0.418   0.182   1.285
Rmerge(I+/-)                                    0.403   0.166   1.254
Rmeas(I)                                        0.427   0.187   1.317
Rmeas(I+/-)                                     0.419   0.173   1.316
Rpim(I)                                         0.086   0.040   0.287
Rpim(I+/-)                                      0.114   0.047   0.395
CC half                                         0.993   0.996   0.593
Wilson B factor                                 0.000
Anomalous completeness                          100.0   100.0     5.4
Anomalous multiplicity                           13.5    13.6    11.0
Anomalous correlation                            0.384   0.664   0.023
Anomalous slope                                 1.143   0.000   0.000
dF/F                                            0.169
dI/s(dI)                                        1.065
Total observations                              301157  15410   12735
Total unique                                    12169   704     616

aimless

Low resolution limit                      138.59    138.59      3.54
High resolution limit                       3.45     15.43      3.45

Rmerge  (within I+/I-)                     0.459     0.203     1.638
Rmerge  (all I+ and I-)                    0.483     0.220     1.699
Rmeas (within I+/I-)                       0.477     0.211     1.717
Rmeas (all I+ & I-)                        0.493     0.227     1.741
Rpim (within I+/I-)                        0.130     0.059     0.513
Rpim (all I+ & I-)                         0.099     0.052     0.378
Rmerge in top intensity bin                0.170        -         -
Total number of observations              301157      3385     18897
Total number unique                        12169       179       904
Mean((I)/sd(I))                              7.1      20.2       2.3
Mn(I) half-set correlation CC(1/2)         0.993     0.995     0.592
Completeness                               100.0      99.8     100.0
Multiplicity                                24.7      18.9      20.9

Anomalous completeness                     100.0     100.0     100.0
Anomalous multiplicity                      13.2      13.4      10.7
DelAnom correlation between half-sets      0.414     0.643    -0.037
Mid-Slope of Anom Normal Probability       1.143       -         -
rjgildea commented 7 years ago

Reply from PRE:

One possibility

I only realised last year when it was pointed out by Clemens Vonrhein that there are two definitions of Rmerge etc

  1. Sum( | Ihl - | ) / Sum(Ihl)
  2. Sum( | Ihl - | ) / Sum()

i.e. they differ in the denominator depending on whether the individual or average values are used

Many programs (I believe) used definition (1), but Aimless uses definition (2) which I think I must have inherited from earlier programs (and actually agrees with the earliest reference that I could find (Arndt UW, Crowther RA, Mallett JFW. A computer-linked cathode-ray tube microdensitometer for X-ray crystallography. Journal of Scientific Instruments (Journal of Physics E). 1968 ;1:510–6.)

I did once compare the two expressions but found they weren’t very different (probably just on one or two datasets, not an exhaustive comparison)

This seems the likely culprit for the discrepancy observed.

rjgildea commented 7 years ago

The following modification to the cctbx source code gives (significantly higher/"worse", for this particular dataset) merging R-factors much closer to those reported by Aimless:

$ git diff
diff --git a/cctbx/miller/merge_equivalents.h b/cctbx/miller/merge_equivalents.h
index 2472cd8..06567bc 100644
--- a/cctbx/miller/merge_equivalents.h
+++ b/cctbx/miller/merge_equivalents.h
@@ -74,7 +74,8 @@ namespace cctbx { namespace miller {
       for(std::size_t i=1;i<n;i++) {
         sum_num += scitbx::fn::absolute(data_group[i] - result);
         sum_den += scitbx::fn::absolute(data_group[i]);
-        sum_merge_den += data_group[i];
+        sum_merge_den += result;
       }
       if (sum_den == 0) self.r_linear.push_back(0);
       else self.r_linear.push_back(sum_num / sum_den);
graeme-winter commented 7 years ago

Testing this now... ISTR it was fixed

graeme-winter commented 7 years ago

xia2.txt

For AUTOMATIC/DEFAULT/NATIVE                 Overall    Low     High
High resolution limit                           1.26    3.42    1.26
Low resolution limit                           53.92   53.97    1.28
Completeness                                   95.3   100.0    59.7
Multiplicity                                    4.8     5.3     2.0
I/sigma                                        10.2    38.5     1.1
Rmerge(I)                                     0.069   0.030   0.489
Rmerge(I+/-)                                  0.062   0.027   0.441
Rmeas(I)                                      0.077   0.034   0.627
Rmeas(I+/-)                                   0.076   0.033   0.622
Rpim(I)                                       0.033   0.014   0.387
Rpim(I+/-)                                    0.043   0.019   0.439
CC half                                       0.999   0.999   0.666
Wilson B factor                               8.519
Anomalous completeness                         87.3    99.3    32.9
Anomalous multiplicity                          2.6     3.1     1.3
Anomalous correlation                         0.004   0.020  -0.030
Anomalous slope                               0.953
Total observations                           317790   20395    4050
Total unique                                  66277    3835    2035

AIMLESS

                                           Overall  InnerShell  OuterShell
Low resolution limit                       53.92     53.92      1.29
High resolution limit                       1.26      5.63      1.26

Rmerge  (within I+/I-)                     0.062     0.028     0.479
Rmerge  (all I+ and I-)                    0.070     0.031     0.539
Rmeas (within I+/I-)                       0.076     0.034     0.674
Rmeas (all I+ & I-)                        0.078     0.035     0.688
Rpim (within I+/I-)                        0.043     0.019     0.474
Rpim (all I+ & I-)                         0.033     0.015     0.421
Rmerge in top intensity bin                0.031        -         -
Total number of observations              317790      4725      6422
Total number unique                        66277       945      3164
Mean((I)/sd(I))                             10.2      31.3       1.1
Mn(I) half-set correlation CC(1/2)         0.999     0.999     0.670
Completeness                                95.3      99.9      63.3
Multiplicity                                 4.8       5.0       2.0

Anomalous completeness                      87.3     100.0      35.0
Anomalous multiplicity                       2.6       3.1       1.4
DelAnom correlation between half-sets      0.011     0.163    -0.120
Mid-Slope of Anom Normal Probability       0.954       -         -
graeme-winter commented 7 years ago

Broadly looks consistent modulo slightly different resolution shells; closing