Add `dt.prod()` test for a grouped column, `by()` docs adjustment

h2oai / datatable

A Python package for manipulating 2-dimensional tabular data structures

https://datatable.readthedocs.io

Mozilla Public License 2.0

1.82k stars 157 forks source link

Add `dt.prod()` test for a grouped column, `by()` docs adjustment #3396

Closed samukweku closed 1 year ago

samukweku commented 1 year ago

the fix for #3390 has been already pushed as a part of #3388, here we merely add a corresponding test;
as of #2472, f[:] excludes the groupby columns, in this PR we make the corresponding adjustments to the docs.

Closes #3390

oleksiyskononenko commented 1 year ago

So it looks that we already had a test in test_reduce.py that should caught the bug

def test_prod_grouped():
    DT = dt.Frame(A=[True, False, True, True], B=[None, None, None, 10], C=[2,3,5,0.1])
    RES = DT[:, prod(f[:]), by(f.A)]
    REF = dt.Frame(A=[False, True], B=[1, 10]/dt.int64, C=[3,1.0]/dt.float64)
    frame_integrity_check(RES)
    assert_equals(RES, REF)
    assert str(RES)

However, prod(f[:]) actually calculated products for B and C columns only... even though one may think that f[:] means all the columns, including the grouped one.

samukweku commented 1 year ago

For groupby , f[:] excludes the grouping column - the user would have to explicitly add the grouping column to the j section

oleksiyskononenko commented 1 year ago

Yes, so I am thinking if we ever documented this behavior. We probably need to adjust:

https://datatable.readthedocs.io/en/latest/api/dt/f.html

and

https://datatable.readthedocs.io/en/latest/api/dt/by.html

to mention that f[:] means different things with and without groupby.

samukweku commented 1 year ago

Was handled here : #2472 ... Not sure if it was documented in the groupby docs

samukweku commented 1 year ago

Yea, I will update the docs to reflect that

samukweku commented 1 year ago

If jis a "select-all" slice (i.e.:), then those columns will also be excluded from the list of all columns so that they will be present in the output only once.

The above seems to cover this and can be found here : https://datatable.readthedocs.io/en/latest/api/dt/by.html

oleksiyskononenko commented 1 year ago

It doesn’t refer to f[:], but only to :. At the same time, on the f-documentation page we say f[:] is all columns.

oleksiyskononenko commented 1 year ago

Anyways, don't worry about that, let me just push a minor commit to this PR and we're all set.

oleksiyskononenko commented 1 year ago

@samukweku I made some minor adjustments to this PR, see if it looks good to you.

samukweku commented 1 year ago

Looks great! Thanks @oleksiyskononenko