Series.mean with skipna=False gives a result that is hard to understand
Steps/Code to reproduce bug
In [2]: s = cudf.Series([None, 0, 1, 1, None, 2, 3, 3, 4, 4])
In [3]: s.mean(False)
Out[3]: 2.25
Expected behavior
With skipna=False I would expect either of two things:
nulls are interpreted as zeros, the length in the denominator to be 10, giving a result of 1.8
nulls are interpreted as NaNs, giving a result of NaN (this is what Pandas does)
But appears that in this case:
null values are being kept (but interpreted as zero) in numerator (i.e. using all 10 values in the numerator)
null values are discarded in the denominator (i.e. dividing by 8 instead of 10)
Alternatively, another way to look at this is to note that in the presence of nulls, mean gives the same result regardless of the value of skipna
In [5]: s.mean(skipna=True)
Out[5]: 2.25
which seems incorrect.
Is this intended/expected behavior? Is it documented somewhere? I looked but could not find anything to describe what skipna=False does in precise, exact terms.
Environment overview (please complete the following information)
Describe the bug
Series.mean
withskipna=False
gives a result that is hard to understandSteps/Code to reproduce bug
Expected behavior
With
skipna=False
I would expect either of two things:But appears that in this case:
Alternatively, another way to look at this is to note that in the presence of nulls,
mean
gives the same result regardless of the value ofskipna
which seems incorrect.
Is this intended/expected behavior? Is it documented somewhere? I looked but could not find anything to describe what
skipna=False
does in precise, exact terms.Environment overview (please complete the following information)
Environment details
Click here to see environment details