Open DeflateAwning opened 1 month ago
Yes, I think we need an hypothesis test for this one. Creating different data-types, nesting types and file formats and see if we can round-trip them.
Pinging @stinodego as he is just working on this.
I'll add these when https://github.com/pola-rs/polars/pull/16062 is merged.
Looks like that one's merged now! Curious if there's any progress on this otherwise?
Description
Related to Issue #16109 (very broken parquet files).
Can we please add "unit" tests (or rather integration tests) like this for every reader/writer (e.g.,
read/write_parquet
,read/write_ndjson
, etc.)? Ideally they'll run >10 times each with >10 different random generations, and perhaps a few different structures (some datetimes, etc.).The non-deterministic failures in the write_parquet function could have been caught with this test, and it's so basic to implement and so useful in checking that the entire write-to-read path works properly.