Closed AnIrishDuck closed 1 year ago
Patch coverage has no change and project coverage change: -0.01
:warning:
Comparison is base (
8ee5ad8
) 83.51% compared to head (1e624c0
) 83.50%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
it's a very complex subject and I'm not shocked that it may need more work.
Me neither. I don't understand this well enough to fix atm, so any help is great. I am working on a separate parquet implementation, but progress is slow.
This uses
sample-test
, andsample-arrow2
, which we built specifically for this purpose. See the README for why we feltquickcheck
andproptest
were unsuitable.I'm not sure if we'd rather have the libraries be optional
[dependencies]
instead of[dev-dependencies]
(which cannot be optional). I figured this should be behind a feature flag though.When run exhaustively (see the commented
TODO
lines), this appears to unearth more errors in the parquet IO code. Issues appear to trigger with nesting and nullable fields in combination. Some examples:My prior experience with the def/rep level encoder obviously leads me to suspect that code. I know it was recently rewritten, but it's a very complex subject and I'm not shocked that it may need more work.
Let me know how I can help assist. In particular, the shrinking behavior in
sample-arrow2
is suboptimal due to chained resampling and some implementation hacks that can probably be improved. I can definitely assist if you're playing around with this and are having trouble shrinking back to useful exemplars.Setting the chunk length to a small value appears to be generating good counterexamples for now.