Closed samukweku closed 1 year ago
So it looks that we already had a test in test_reduce.py
that should caught the bug
def test_prod_grouped():
DT = dt.Frame(A=[True, False, True, True], B=[None, None, None, 10], C=[2,3,5,0.1])
RES = DT[:, prod(f[:]), by(f.A)]
REF = dt.Frame(A=[False, True], B=[1, 10]/dt.int64, C=[3,1.0]/dt.float64)
frame_integrity_check(RES)
assert_equals(RES, REF)
assert str(RES)
However, prod(f[:])
actually calculated products for B
and C
columns only... even though one may think that f[:]
means all the columns, including the grouped one.
For groupby , f[:]
excludes the grouping column - the user would have to explicitly add the grouping column to the j
section
Yes, so I am thinking if we ever documented this behavior. We probably need to adjust:
https://datatable.readthedocs.io/en/latest/api/dt/f.html
and
https://datatable.readthedocs.io/en/latest/api/dt/by.html
to mention that f[:]
means different things with and without groupby.
Was handled here : #2472 ... Not sure if it was documented in the groupby docs
Yea, I will update the docs to reflect that
If
jis a "select-all" slice (i.e.
:), then those columns will also be excluded from the list of all columns so that they will be present in the output only once.
The above seems to cover this and can be found here : https://datatable.readthedocs.io/en/latest/api/dt/by.html
It doesn’t refer to f[:]
, but only to :
. At the same time, on the f-documentation page we say f[:]
is all columns.
Anyways, don't worry about that, let me just push a minor commit to this PR and we're all set.
@samukweku I made some minor adjustments to this PR, see if it looks good to you.
Looks great! Thanks @oleksiyskononenko
f[:]
excludes the groupby columns, in this PR we make the corresponding adjustments to the docs.Closes #3390