pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.89k stars 1.92k forks source link

Division by zero error when row_group_size argument smaller than total number of rows #4995

Closed marsupialtail closed 2 years ago

marsupialtail commented 2 years ago

Polars version checks

Issue Description

See title

Reproducible Example

nation.tbl:
0|ALGERIA|0| haggle. carefully final deposits detect slyly agai|
1|ARGENTINA|1|al foxes promise slyly according to the regular accounts. bold requests alon|
2|BRAZIL|1|y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special |
3|CANADA|1|eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold|
4|EGYPT|4|y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d|
5|ETHIOPIA|0|ven packages wake quickly. regu|
6|FRANCE|3|refully final requests. regular, ironi|
7|GERMANY|3|l platelets. regular accounts x-ray: unusual, regular acco|
8|INDIA|2|ss excuses cajole slyly across the packages. deposits print aroun|
9|INDONESIA|2| slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull|
10|IRAN|4|efully alongside of the slyly final dependencies. |
11|IRAQ|4|nic deposits boost atop the quickly final requests? quickly regula|
12|JAPAN|2|ously. final, express gifts cajole a|
13|JORDAN|4|ic deposits are blithely about the carefully regular pa|
14|KENYA|0| pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t|
15|MOROCCO|0|rns. blithely bold courts among the closely regular packages use furiously bold platelets?|
16|MOZAMBIQUE|0|s. ironic, unusual asymptotes wake blithely r|
17|PERU|1|platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun|
18|CHINA|2|c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos|
19|ROMANIA|3|ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account|
20|SAUDI ARABIA|4|ts. silent requests haggle. closely express packages sleep across the blithely|
21|VIETNAM|2|hely enticingly express accounts. even, final |
22|RUSSIA|3| requests against the platelets use never according to the quickly regular pint|
23|UNITED KINGDOM|3|eans boost carefully special requests. accounts are. carefull|
24|UNITED STATES|1|y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be|

'''
nation_scheme = ["n_nationkey","n_name","n_regionkey","n_comment","null"]
import polars as pl
nation_ds = pl.read_csv("/home/ziheng/tpc-h/nation.tbl",new_columns=nation_scheme,sep="|",has_header = False, parse_dates = True).drop("null")
nation_ds.write_parquet("/home/ziheng/tpc-h/nation.parquet",row_group_size=100000)
'''

Expected Behavior

row_group_size should just be ignored and produce a parquet with one row group without this error

Installed Versions

``` 0.13.54```
ritchie46 commented 2 years ago

I cannot reproduce this? Have you got an MWE?

marsupialtail commented 2 years ago

I think this might have been fixed in latest version. Sorry I wasn't using latest version.