stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.62k stars 214 forks source link

About the fisher matrix of Normal Distribution #353

Open Yingrui-Z opened 2 months ago

Yingrui-Z commented 2 months ago

Question 1:

The first partial derivative of the log of normal distribution (l=-log(N~(μ,σ^2))) with respect to the parameters μ and σ is

In ngboost, the corresponding implementation is:

def d_score(self, Y):
    D = np.zeros((len(Y), 2))
    D[:, 0] = (self.loc - Y) / self.var
    D[:, 1] = 1 - ((self.loc - Y) ** 2) / self.var
    return D

This raises a question: why is D[:, 1] set to 1 - ((self.loc - Y) ** 2) / self.var instead of 1/sqrt(self.var)*(1 - ((self.loc - Y) ** 2) / self.var)

Question 2:

Besides, based on the information from Wikipedia on Normal Distribution, the Fisher Information matrix for a normal distribution is defined as follows:

However, in the ngboost implementation, specifically in the NormalLogScore class within normal.py, the code snippet is:

def metric(self):
    FI = np.zeros((self.var.shape[0], 2, 2))
    FI[:, 0, 0] = 1 / self.var
    FI[:, 1, 1] = 2
    return FI

This raises another question: why is FI[1, 1] set to 2 instead of 2\self.var as per the theoretical formula?

Could you please clarify this discrepancy? Thank you for your assistance.

avati commented 2 months ago

Both your questions revolve around using \mu, `\sigma^` parametrization for the distribution (for gradients and Fisher information).

In NGBoost the parametrization is \mu, \log \sigma^2. If you work out the math with this parametrization, you should see the expressions match the implementation in the code.

Yingrui-Z commented 2 months ago

Thank you for your kind explanation!

I am delving into a probability density function (pdf) that is defined as follows:

p(x | a, b, c) = a * b * exp(-a * (x - c)) / (1 + exp(-a * (x - c))) ^ (b + 1)

Here, a, b, and c represent the parameters. Calculating the first-order derivatives and the Fisher Information Matrix for these parameters has proven to be exceptionally complex.

In contexts involving Normal distributions, transformations such as new_σ = log(σ^2) have significantly simplified calculations. However, given the complex multiplicative interactions between the parameters in this pdf, implementing similar transformations poses a challenge.

Could you offer any insights or suggestions on how to transform this pdf to simplify the formulation of the Fisher Information Matrix?

Both your questions revolve around using \mu, \sigma^ parametrization for the distribution (for gradients and Fisher information).

In NGBoost the parametrization is \mu, \log \sigma^2. If you work out the math with this parametrization, you should see the expressions match the implementation in the code.