h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
321 stars 85 forks source link

polars groupby NA data case failed to be read #188

Closed jangorecki closed 3 years ago

jangorecki commented 3 years ago

all three NAs data cases 1e7, 1e8 and 1e9 are failing to be read with following kinds of errors

Traceback (most recent call last):
  File "./polars/groupby-polars.py", line 25, in <module>
    x = pl.read_csv(src_grp, dtype={"id4":pl.Int32, "id5":pl.Int32, "id6":pl.Int
32, "v1":pl.Int32, "v2":pl.Int32, "v3":pl.Float64})
  File "/home/jan/git/db-benchmark/polars/py-polars/lib/python3.6/site-packages/
pypolars/functions.py", line 91, in read_csv
    use_stable_parser=use_stable_parser,
  File "/home/jan/git/db-benchmark/polars/py-polars/lib/python3.6/site-packages/
pypolars/frame.py", line 162, in read_csv
    use_stable_parser,
RuntimeError: Any(Other("Other(\"Could not parse primitive type during csv parsi
ng: Error { code: InvalidDigit, index: 2 }\") on thread line 7; on input: 33.981
703"))
jangorecki commented 3 years ago

@ritchie46 I am using 0.6.1 and still getting

Traceback (most recent call last):
  File "./polars/groupby-polars.py", line 25, in <module>
    x = pl.read_csv(src_grp, dtype={"id4":pl.Int32, "id5":pl.Int32, "id6":pl.Int
32, "v1":pl.Int32, "v2":pl.Int32, "v3":pl.Float64})
  File "/home/jan/git/db-benchmark/polars/py-polars/lib/python3.6/site-packages/
pypolars/functions.py", line 93, in read_csv
    use_stable_parser=use_stable_parser,
  File "/home/jan/git/db-benchmark/polars/py-polars/lib/python3.6/site-packages/
pypolars/frame.py", line 163, in read_csv
    use_stable_parser,
RuntimeError: Any(Other("Other(\"Could not parse primitive type during csv parsi
ng: Error { code: InvalidDigit, index: 2 }\") on thread line 7; on input: 71.141
995"))
ritchie46 commented 3 years ago

Hmm.. :thinking: I will figure this out and come back to you.

ritchie46 commented 3 years ago

Ok, luckily it was an easy fix. Just to make sure I regenerated the data and I can now confirm that I had a successful run on:

with release 0.6.2

jangorecki commented 3 years ago

Looking at the logs of currently running 0.6.2 and it looks to be resolved.