For transparencies sake I am posting the following email correspondence with Ediwn Tye the original author of PyGOM:
Dear Edwin
Thanks for the response. I did think for a while that not providing the differentials with regards to sigma would be ok if the user is fine with the assumption that variability does not change between predictions. For now I can make the same assumption with the negative binomial and gamma loss functions I am working on as extensions for PyGOM’s model fitting.
If it is ok with you I will raise this as an issue on the PyGOM github page and paste in this email thread for transparencies sake.
Yours Martin
Dr. Martin Grunnill
Senior Mathematical Modeller
Emergency Response Department (Science & Technology)
Public Health England
Manor Farm Rd
Porton Down
SP4 0JG
martin.grunnill@phe.gov.uk
Tel: 01980612618
www.gov.uk/phe
Follow us on Twitter @PHE_uk
If you want to extend, then you should file an issue and raise a PR. That way, it will have a lot more visibility and transparency when you face audit one day.
On the loss functions, although I don't have the ability of open the python file now, the master branch on Github - which hopefully is up to date - does have a second derivative for the normal loss; Line 216. As you said, this is only for the estimation and the reason is that $\sigma$ is not an estimate. During the object initialization, you input $\sigma$, and is therefore a fixed value. The intended use is more on the lines of providing a Bayesian-like prior information.
You can argue that $\sigma$ should be estimated and that is not a bad idea. I never had the time, nor see the benefit in estimating $\sigma$ simply because estimating a scalar $\sigma$ is slightly pointless, while the vector $\mathbb{\sigma}$ is not statistically sound because the differential equation is time dependent, so you would need some sort of constraint in the form of say autoregressive models, i.e. $\sigma{t+1} = \alpha \sigma{t}, \alpha \in \left[ -1, 1 \right]$. If you want to use matrix $\mathbb{\Sigma}$ then I have nothing to say because I haven't thought about that scenario at all.
I am trying to extend Pygom to include fitting with negative binomial, gamma and binomial loss functions. In trying to understand the code I have been looking at the normal loss class (see attached .py file line 139). I am fairly new to model fitting algorithms and other than justifying why certain data conforms to a certain distribution I have tended to see them as a bit of black box. From doing some reading I understand that a lot of methods are based upon Newton type algorithms, using first and second derivative to hone in on the lowest negative log likelihood. What I can’t seem to understands with regards to PyGOM is why the normal loss class provides the 1st and 2nd derivatives with regards to the prediction (mean, \mu or yhat), but ignores the 1st and 2nd derivatives with regards to the spread around the prediction (sigma). With Poisson loss this wouldn’t be an issue as the mean = variance. However with normal loss couldn’t PyGOM make kind of mistake I have tried to illustrate in the attached pdf.
Please understand that I am fairly new to these sorts of algorithms, also I originally trained as a biologist and have simply picked up a lot of maths along the way.
Thanks very much for this. I hope I have been clear in explaining the issue. If not my work contact details are below, I think you have my mobile if not it is 0#########.
Yours Martin
Dr. Martin Grunnill
Senior Mathematical Modeller
Emergency Response Department (Science & Technology)
Public Health England
Manor Farm Rd
Porton Down
SP4 0JG
martin.grunnill@phe.gov.uk
Tel: 01980612618
www.gov.uk/phe
Follow us on Twitter @PHE_uk
Why no derivatives of sigma.pdf
For transparencies sake I am posting the following email correspondence with Ediwn Tye the original author of PyGOM:
Dear Edwin
Thanks for the response. I did think for a while that not providing the differentials with regards to sigma would be ok if the user is fine with the assumption that variability does not change between predictions. For now I can make the same assumption with the negative binomial and gamma loss functions I am working on as extensions for PyGOM’s model fitting.
If it is ok with you I will raise this as an issue on the PyGOM github page and paste in this email thread for transparencies sake.
Yours Martin
Dr. Martin Grunnill Senior Mathematical Modeller Emergency Response Department (Science & Technology) Public Health England Manor Farm Rd Porton Down SP4 0JG martin.grunnill@phe.gov.uk
Tel: 01980612618
www.gov.uk/phe
Follow us on Twitter @PHE_uk
From: Edwin Tye #######@gmail.com Sent: 29 March 2020 15:35 To: Martin Grunnill Martin.Grunnill@phe.gov.uk Subject: Re: OFFICIAL: Derivatives of Loss functions with Pygom
If you want to extend, then you should file an issue and raise a PR. That way, it will have a lot more visibility and transparency when you face audit one day.
On the loss functions, although I don't have the ability of open the python file now, the master branch on Github - which hopefully is up to date - does have a second derivative for the normal loss; Line 216. As you said, this is only for the estimation and the reason is that $\sigma$ is not an estimate. During the object initialization, you input $\sigma$, and is therefore a fixed value. The intended use is more on the lines of providing a Bayesian-like prior information.
You can argue that $\sigma$ should be estimated and that is not a bad idea. I never had the time, nor see the benefit in estimating $\sigma$ simply because estimating a scalar $\sigma$ is slightly pointless, while the vector $\mathbb{\sigma}$ is not statistically sound because the differential equation is time dependent, so you would need some sort of constraint in the form of say autoregressive models, i.e. $\sigma{t+1} = \alpha \sigma{t}, \alpha \in \left[ -1, 1 \right]$. If you want to use matrix $\mathbb{\Sigma}$ then I have nothing to say because I haven't thought about that scenario at all.
Edwin
On Sun, Mar 29, 2020 at 2:24 PM Martin Grunnill Martin.Grunnill@phe.gov.uk wrote: OFFICIAL
Dear Edwin
I am trying to extend Pygom to include fitting with negative binomial, gamma and binomial loss functions. In trying to understand the code I have been looking at the normal loss class (see attached .py file line 139). I am fairly new to model fitting algorithms and other than justifying why certain data conforms to a certain distribution I have tended to see them as a bit of black box. From doing some reading I understand that a lot of methods are based upon Newton type algorithms, using first and second derivative to hone in on the lowest negative log likelihood. What I can’t seem to understands with regards to PyGOM is why the normal loss class provides the 1st and 2nd derivatives with regards to the prediction (mean, \mu or yhat), but ignores the 1st and 2nd derivatives with regards to the spread around the prediction (sigma). With Poisson loss this wouldn’t be an issue as the mean = variance. However with normal loss couldn’t PyGOM make kind of mistake I have tried to illustrate in the attached pdf.
Please understand that I am fairly new to these sorts of algorithms, also I originally trained as a biologist and have simply picked up a lot of maths along the way.
Thanks very much for this. I hope I have been clear in explaining the issue. If not my work contact details are below, I think you have my mobile if not it is 0#########.
Yours Martin
Dr. Martin Grunnill Senior Mathematical Modeller Emergency Response Department (Science & Technology) Public Health England Manor Farm Rd Porton Down SP4 0JG martin.grunnill@phe.gov.uk
Tel: 01980612618
www.gov.uk/phe
Follow us on Twitter @PHE_uk Why no derivatives of sigma.pdf