ruby-numo / numo-narray

Ruby/Numo::NArray - New NArray class library
http://ruby-numo.github.io/narray/
BSD 3-Clause "New" or "Revised" License
415 stars 41 forks source link

Numo::NArray#var #85

Open hatappi opened 6 years ago

hatappi commented 6 years ago

I found a difference in Numo::NArray#var when I was comparing with numpy.

numpy

>>> np.var(np.array([[1, 2], [3, 4]], dtype="f"))
1.25

Numo::NArray

> Numo::DFloat[[1, 2], [3, 4]].var()
=> 1.6666666666666667
sonots commented 6 years ago

The variance of numo is "unbiased" sample variance whose denominator is N-1 where N is number of samples.

However, the numpy's denominator is N. You can get the same result with Numo with ddof=1 as:

In [3]: np.var(np.array([[1, 2], [3, 4]], dtype="f"), ddof=1)
Out[3]: 1.6666666

However, Numo currently does not support ddof argument. So, we can not get the same result with numpy's default (in my understanding).

Numo should support ddof argument although the default behavior should keep different with Numpy for backward compatibility.

giuse commented 6 years ago

In my opinion, I prefer Numo's choice of default.

The standard use of Numo is for data samples, for which the correct choice of variance is the sample variance (denominator n-1). Population variance (denominator n, numpy's default), is only correct if you have the whole population. If you have a population approximation, i.e. n extremely big, the difference between the two vanishes anyway, as per theory. References: [wiki], [SO].

I would agree on suggesting an extra argument to fetch the population variance, but only for completeness, and with a more understandable argument name than ddof (example: ary.var(type: :population)).

I wish for Numo to become a better alternative to numpy, rather than to only emulate it.

[EDIT] Oh looks like Sonots-san already answered that, sorry for the double reply.