h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures
https://datatable.readthedocs.io
Mozilla Public License 2.0
1.81k stars 155 forks source link

`dt.countna()` returns wrong results for grouped columns #3441

Closed oleksiyskononenko closed 1 year ago

oleksiyskononenko commented 1 year ago
>>> DT = dt.Frame([None])
>>> DT[:, dt.countna(f.C0), dt.by(f.C0)]
   |   C0     C1
   | void  int64
-- + ----  -----
 0 |   NA      0
[1 row x 2 columns]

The result should actually be

   |   C0     C1
   | void  int64
-- + ----  -----
 0 |   NA      1
[1 row x 2 columns]
samukweku commented 1 year ago

@oleksiyskononenko this is resolved in #3440

oleksiyskononenko commented 1 year ago

@samukweku Then we need to add a corresponding test and update PR description saying "Closes #3441".

oleksiyskononenko commented 1 year ago

So the issue here is that we've been returning 0 for void columns, no matter if we calculate missing or non-missing values: https://github.com/h2oai/datatable/blob/main/src/core/expr/head_reduce_unary.cc#L457-L460

We had to take into account the NA value to make it work properly.