prmiles / pymcmcstat

Python implementation of MATLAB toolbox "mcmcstat"
https://github.com/prmiles/pymcmcstat/wiki
MIT License
71 stars 10 forks source link

Let user define prior and likelihood functions #40

Open prmiles opened 5 years ago

prmiles commented 5 years ago

Is your feature request related to a problem? Please describe. The user is limited to being able to define logprior and loglikelihood functions. This is limiting for many applications, although it does keep the ui simple.

Describe the solution you'd like Generalize the behavior to let the user define each, which still keeping the simplicity of the default behavior.

prmiles commented 5 years ago

This is currently being worked on and is expected to be released by the end of July 2019. We are currently working on a solution that will maintain backwards compatibility with the user interface, but depending on the level of changes required this may require releasing a 2.0 series of pymcmcstat.

jasonmhite commented 5 years ago

@prmiles This would actually be great!

I was looking to use your code in a project I'm working on, but have measurements where the observations have different variances. I tried to hack it in but I'm not terribly confident I did it correctly.

prmiles commented 5 years ago

@jasonmhite Thank you for the comment! I had hoped to complete this issue before leaving my old job; however, I'm currently in transition. I believe I will be able to address this issue as part of my new position, but it is difficult to put a timeline on it.

For your particular problem, are you dealing with two sets of observations that have different variances? Or is the variance proportional to the magnitude of the response (or something else)? The former issue can be accommodated if the error is independent and identically distributed for each set (see this example), but the latter issue will require a different statistical model.

jasonmhite commented 5 years ago

@prmiles Ah I didn’t know you had left! Hope you are doing well in the transition.

For the problem in question, a single observation is the counts in a particular channel of a gamma radiation spectrum. The available data is the counts in four different channels. Errors are all mean zero, normally distributed and independent, but the variance is proportional to number of counts in that channel. I know how to set this up with a general Normal likelihood, but if I understand correctly the likelihood used here in pymcmcstat is assuming that each data point has the same error variance. In other words, in pymcmcstat the argument to the exponential in the likelihood is -SSQ / 2 * var, but what I want is to pull the variance inside the summation so that the argument to the exponential is -sum_i (| x^i_{predicted} - x^i_{data} |^2 / 2*var_i) (I think that is right, you get the idea at least I hope... it’s the variance weighted SSQ).

I could just be confusing myself, this discrepancy has always been a source of confusion for me.

prmiles commented 5 years ago

@jasonmhite That makes complete sense and there might be a way to do it... However, you are correct in observing that the default behavior of pymcmcstat is to use -SSQ / 2 * var. I'll try to look into this later this week.