Open fredrikmalmfors opened 2 weeks ago
The mean() of groups consisting of nulls only is NaN instead of null when the partitioned group_by method is used.
mean()
NaN
null
partitioned hash aggregation
df with 1000+ rows triggers the partitioned aggregation method
df = pl.DataFrame( {"category": [1] * 1000, "value": [None] * 1000}, schema={"category": pl.Int8, "value": pl.Float64}, ) df.group_by('category').mean()
result is NaN (expected null)
estimated unique values: 1 run PARTITIONED HASH AGGREGATION group_by keys are sorted; running sorted key fast path shape: (1, 2) ┌──────────┬───────┐ │ category ┆ value │ │ --- ┆ --- │ │ i8 ┆ f64 │ ╞══════════╪═══════╡ │ 1 ┆ NaN │ └──────────┴───────┘
default hash aggregation
A df with rows less than 1000 uses default hash aggregation
df = pl.DataFrame( {"category": [1] * 999, "value": [None] * 999}, schema={"category": pl.Int8, "value": pl.Float64}, ) df.group_by('category').mean()
result is null as expected
DATAFRAME < 1000 rows: running default HASH AGGREGATION shape: (1, 2) ┌──────────┬───────┐ │ category ┆ value │ │ --- ┆ --- │ │ i8 ┆ f64 │ ╞══════════╪═══════╡ │ 1 ┆ null │ └──────────┴───────┘
streaming hash aggregation (0.20.19 and earlier)
The now removed streaming hash aggregation produced the expected result.
estimated unique values: 1 run STREAMING HASH AGGREGATION RUN STREAMING PIPELINE [df -> primitive_group_by -> ordered_sink] shape: (1, 2) ┌──────────┬───────┐ │ category ┆ value │ │ --- ┆ --- │ │ i8 ┆ f64 │ ╞══════════╪═══════╡ │ 1 ┆ null │ └──────────┴───────┘
Checks
Issue description
The
mean()
of groups consisting of nulls only isNaN
instead ofnull
when the partitioned group_by method is used.Reproducible example
partitioned hash aggregation
df with 1000+ rows triggers the partitioned aggregation method
result is NaN (expected null)
default hash aggregation
A df with rows less than 1000 uses default hash aggregation
result is null as expected
streaming hash aggregation (0.20.19 and earlier)
The now removed streaming hash aggregation produced the expected result.
result is null as expected
Installed versions