Multinomial calculation for Vdata (`Mixer.getVcd()`) is not implemented

jrbourbeau / pyunfold

Iterative unfolding for Python

https://jrbourbeau.github.io/pyunfold/

MIT License

25 stars 13 forks source link

Multinomial calculation for Vdata (`Mixer.getVcd()`) is not implemented #112

Open jvavrek opened 3 years ago

jvavrek commented 3 years ago

There is no multinomial option for the data covariance matrix, Eq. 8 in the Iterative Unfolding reference doc v1.0. Given that only "in some cases it is safe to use the Poisson form" (Eq. 10), one might want the full multinomial option.

zhampel commented 3 years ago

@jvavrek Yes indeed there is no multinomial option for data covariance matrix. For counting experiments, the assumption is that each measured datum is independent, i.e. following Poisson stats. The MC cov having multinomial option is due to the combined action of drawing from a known simulated distribution and the normalization of the response matrix. Do you have a situation where the measured data spectrum/distribution requires a multinomial cov matrix? BTW, I must admit, we may have paraphrased 'in some cases it is safe...' from an old document by Adye which is in the references.

jvavrek commented 3 years ago

@zhampel Thanks for the clarification. In my case the measured data is Poisson, but I was not 100% certain how the multinomial/Poisson justifications differed between the data covariance matrix and the MC covariance matrix. It may be helpful for others to spell out in the documentation in which cases it is safe to use the Poisson option.

To double-check, if I use a multinomial covariance option for the MC but take the response matrix as a given (i.e., zero error on the response) I get the unfolded same as the Poisson result, correct?

zhampel commented 3 years ago

@jvavrek Good idea, I should add in some clarification in the docs. For now, there is only one cov option for data: Poisson. And for MC there are both Poisson & multinomial. Normally, one builds an unnormalized response matrix via simulation (MC, e.g.) and then normalizes along the effects dimension for each cause bin. This leads to the use of MC cov matrix build from multinomial dist.

As for your question, can you please clarify? If you use multinomial cov option for MC, that serves as the error for the response matrix.

jvavrek commented 3 years ago

@zhampel if I zero out the response and efficiencies errors with cov_type='multinomial' I get the same result as with cov_type='poisson' (just much slower), e.g.,:

unfolded_results = iterative_unfold(
    data=data,
    data_err=np.sqrt(data),
    response=response,
    response_err=response_err*0,  # zero response error
    efficiencies=efficiencies,
    efficiencies_err=efficiencies_err*0,  # zero efficiency error
    cov_type='multinomial',
    callbacks=[Logger()],
)

What I wanted to double check with this is that the 'multinomial' option is only useful if one wants to account for uncertainties in the response. If one takes the response as given (i.e., no uncertainty), then 'poisson' is fine.

zhampel commented 3 years ago

@jvavrek Sorry to ask, I just want to be complete here. Are they precisely the same results? Again, just wanting to check whether the resulting figure looks the same, or whether the values are identical. What are the magnitudes of the uncertainties for the response & efficiency arrays? I could see a case where the response matrix is highly linear (i.e. very close to diagonal), the error values are small off diagonal, and thus the Multinomial could be very close to Poisson results. Again, not questioning your work, just asking for more info.