madscatt / zazzie

development branch
GNU General Public License v3.0
2 stars 3 forks source link

Perform an error analysis: stoichiometry module #133

Closed skrueger111 closed 1 year ago

skrueger111 commented 2 years ago

We have a function of this form:

y = 0 = x0*m1*2 + x1m1m2 + x2m2*2 - x3m1 - x3*m2, where x0, x1, x2 and x3 contain parameters such as delta_rho, partial_specific_volume, I(0) and concentration, all of which could potentially have uncertainties. We are solving for m1 and m2 by solving a set of simultaneous equations at each contrast.

scipy.opimize.curve_fit can take either x and y OR x, y and sigma (the error in y).

We can't get an initial error in y by propagating the errors in x0, x1, x2 and x3 since we need m1 and m2 in order to do that. So, the equation is currently optimized without errors in y to get values for m1 and m2. Thus, there is no way to know the "real" standard deviation on m1 and m2. (Note that if we were optimizing something simple like y=m*x+b, we could get the standard deviations on m and b, because we would presumably have measured errors on y that we could input into the optimizer.)

The optimized m1 and m2 values can be plugged back into the function so that we don't get exactly y=0 at each contrast anymore. Rather, we get some small, nonzero values. These are essentially the residuals in y, or one could even argue they are the error in y. But, we haven't taken the errors in x0, x1, x2 and x3 into account. AND, the residuals in y aren't that useful to the end user.

More useful for the end user is a calculate I(0) at each contrast using the optimized m1 and m2 values and compare them to the measured I(0) values. This is what is currently being done. Outputs are the experimental I(0) values, the calculated I(0) values and the difference (residuals). Again, this is more useful to the end user, since they presumably have a measured error on I(0) and they can then see if the calculated values fall within this error bar. But, we still haven't taken into account completely the errors in x0, x1, x2 and x3.

If we want to take the errors in x into account, we can pick new values for the parameters that we can estimate an error for, i.e., partial_specific_volume, I(0) and concentration. The new parameters would be chosen at random, but they would have to be within +/- their error. Then we would run the optimizer again to get another set of m1, m2 values. This can be hundreds or thousands of times, for that matter, until we have a nice distribution of m1, m2 values. This would give us an idea of the standard deviation on these values.

Once we have an idea of how much "typical" errors on delta_rho, partial_specific_volume, I(0) and concentration matter, we can decide if we want to offer this analysis as an advanced option for the end user.

skrueger111 commented 1 year ago

A Monte Carlo error analysis method was employed to test the effect of the errors on partial specific volume, I(0) and concentration on the resultant molecular weights. The errors on I(0) are from the experimental Guinier, P(r) or other analysis and those on concentration were assumed to be 5-10% of the measured value. The uncertainties in partial specific volume and delta_rho are not inputs in the other 3 methods, i.e., match point, sturhrmann/parallel axis or decomposition. The reason is that they are difficult to predict/calculate. When contrast calculator was written, a conscious decision was made to use values from the literature and our contrast calculator paper indicated which values we are using. To carry this philosophy through to the multi-component analysis methods, only the errors on I(0) and concentration were taken into account in the MC error analysis. A number of different MC trial values, i.e., 10, 100, 1000, 10000, were tested. For the number of trials > 100, the average molecular weight values (over all of the trials) were consistent, with the uncertainty (standard error of the mean) in the averaged values decreasing as the number of trials increased. In SASSIE3, the plan is to allow the user to manually enter their own delta_rho and/or partial specific volume parameters if they wish to explore what they believe is a "better" value than those we are using. Also, the number of MC trials can be a user input so the user can explore values other than what we decide to set as the default. A separate ticket will be opened to add the MC error analysis to the stoichiometry method.