mathnet / mathnet-numerics

Math.NET Numerics
http://numerics.mathdotnet.com
MIT License
3.5k stars 896 forks source link

Kurtosis and Skewness returns NaN for all values equal #593

Open TobiasGlaubach opened 6 years ago

TobiasGlaubach commented 6 years ago

Hi, I found, that the Kurtosis method returns NaN, in a case when all samples are equal.

Testcode:

var a = Generate.Repeat<double>(560, 73.0d).ToArray();
var k = a.Kurtosis();
// k --> NaN
var s = a.Skewness();
// s --> NaN

Check in python:

import numpy as np
import scipy.stats as st
v = np.ones(560) * 73.0
k = st.kurtosis(v)
# k --> -3.0
k2 = st.kurtosis(v, fisher=False)
# k2 --> 0.0
s = st.skew(v)
# s --> 0.0

looking at numpy's reference it states, these definitions should be OK.

Looking at the MathNet.numerics implementation I found, that it needs to be more than 3 and 4 values, which can not be the problem:

        /// <summary>
        /// Estimates the unbiased population kurtosis from the provided samples.
        /// Uses a normalizer (Bessel's correction; type 2).
        /// Returns NaN if data has less than four entries or if any entry is NaN.

        /// Estimates the unbiased population skewness from the provided samples.
        /// Uses a normalizer (Bessel's correction; type 2).
        /// Returns NaN if data has less than three entries or if any entry is NaN.

Is the definition different from numpy, or is there an issue?

TobiasGlaubach commented 6 years ago

Ok, as expected the problem seems to be that the 2nd Moment (denominator) is zero for all equal values, which results in a division by zero.

scipy solves this problem by:


    m2 = moment(a, 2, axis)
    m4 = moment(a, 4, axis)
    vals = np.where(m2 == 0, 0, m4 / m2**2.0)

which basically just replaces the NaN values with zeros.

Should we implement something similar, or keep the definition like it is? Any opinions on that?