Open hatappi opened 6 years ago
The variance of numo is "unbiased" sample variance whose denominator is N-1 where N is number of samples.
However, the numpy's denominator is N. You can get the same result with Numo with ddof=1
as:
In [3]: np.var(np.array([[1, 2], [3, 4]], dtype="f"), ddof=1)
Out[3]: 1.6666666
However, Numo currently does not support ddof
argument. So, we can not get the same result with numpy's default (in my understanding).
Numo should support ddof
argument although the default behavior should keep different with Numpy for backward compatibility.
In my opinion, I prefer Numo's choice of default.
The standard use of Numo is for data samples, for which the correct choice of variance is the sample variance (denominator n-1
). Population variance (denominator n
, numpy's default), is only correct if you have the whole population. If you have a population approximation, i.e. n
extremely big, the difference between the two vanishes anyway, as per theory.
References: [wiki], [SO].
I would agree on suggesting an extra argument to fetch the population variance, but only for completeness, and with a more understandable argument name than ddof
(example: ary.var(type: :population)
).
I wish for Numo to become a better alternative to numpy, rather than to only emulate it.
[EDIT] Oh looks like Sonots-san already answered that, sorry for the double reply.
I found a difference in
Numo::NArray#var
when I was comparing with numpy.numpy
Numo::NArray