madscatt / zazzie

development branch
GNU General Public License v3.0
2 stars 3 forks source link

Decide how to handle model data with no error or error = 0.0 #158

Closed skrueger111 closed 2 months ago

skrueger111 commented 11 months ago

The multi-component analysis method is envisioned to be used on both experimental AND model data. However, model data don't have errors on the relevant input parameters. For instance, a model data file consists either of only two columns, q, I(q), or three columns, q, I(q), error on I(q), where the error on I(q) is 0.0 for all q values. (The latter is the format for files created by sascalc so that the model SANS curves can be plotted in IGOR with the NIST SANS macros without crashing. ) Also, model I(0) values (from contrast calculator) or model Rg values (from sascalc) do not have errors associated with them. The multi-component analysis methods (match point, Stuhrmann/parallel axis, decomposition, stoichiometry) perform weighted fits and expect errors on the input variables. What is the best way to handle this in the case of model data?

madscatt commented 11 months ago

Data Interpolation handles error in this way

`
try: qval=locale.atof(this_line[0]) ival=locale.atof(this_line[1])

ival=abs(locale.atof(this_line[1]))

        try:
            error_value = locale.atof(this_line[2])
        except:
            error_value = error_magnitude * ival
            fake_error = True

`

Thus, if there is an error in the original file it is read in, otherwise it is arbitrarily set to 0.1 * the I(q) value (i.e. 10%). We have spoken about this in the past and decided that we really need an instrument specific error profile. The problem of using the original errors is that they could be at q-values that don't match the spacing in the desired interpolated data and merely averaging the error from the closest points used to generate the local interpolated q-value may not always be the right approach. For instance, when different detectors are used to splice together I(q) data where the end of one regime has significantly different signal to noise than the other regime.

I suppose one could give the user options to use 10% of I(q) as the error value if none is present, and if I(q)_error is in the original data we could offer to average the error data from surrounding data points (i.e. interpolate the data error).

To get this error estimate into SasCalc we would require a data input (optional) to read in the interpolated data file to acquire the error data at the q-points that SasCalc will calculate I(q) data. Perhaps one could perturb the original error value from the interpolated data file so that it is distributed about the original error value to avoid introduction of systematic error to our error estimate.

I think that a round table survey of how error is handled should come from experimentalists, which we haven't as yet figured out how to get.

skrueger111 commented 2 months ago

decomposition.py now reads data files using read_sas_data_file_add_error.py. If there are no errors, as in the case with model data, random Gaussian noise is added at each data point using add_noise_to_sas_data.py. A description of these helper methods can be found here https://github.com/madscatt/zazzie/wiki/Multi-component-Analysis-Helper-Methods.