jqnatividad / qsv

Blazing-fast Data-Wrangling toolkit
https://qsv.dathere.com
The Unlicense
2.52k stars 71 forks source link

strip_suffix sometimes failing in pipeline #2216

Closed tmtmtmtm closed 4 weeks ago

tmtmtmtm commented 1 month ago

Since updating from 0.134 to 0.136, I'm getting problems with qsv apply operations strip_suffix against long CSVs in a pipeline.

The shortest case I've been able to boil it down to is the following:

data.csv

If I run qsv apply operations strip_suffix file --comparand '-file' data.csv everything is fine.

But if that CSV is being produced from another command, it fails:

> qsv fmt data.csv | qsv apply operations strip_suffix file --comparand '-file'         
e,f,g,h,file
Error reading file: CSV error: record 2026 (line: 2027, byte: 65533): found record with 1 fields, but the previous record has 5 fields
tmtmtmtm commented 1 month ago

After submitting this, I noticed that the 65533 was quite suspicious, and ran a few other tests. It looks like this always fails with a number just under 65535, so I'm guessing that reading the next chunk is hitting some sort of 16-bit boundary there, and thinking the line ends mid-field.

(Full version string is qsv 0.136.0-mimalloc-apply;fetch;foreach;geocode;Luau 0.640;to;polars-0.43.1-ee9bafb;self_update-8-8;12.80 GiB-1.37 GiB-0 B-16.00 GiB (aarch64-apple-darwin compiled with Rust 1.81) prebuilt)

jqnatividad commented 1 month ago

Thanks @tmtmtmtm for the detailed report, complete with a test case! I can reproduce it and currently investigating.

jqnatividad commented 1 month ago

I'm glad to report that bug is fixed on master.

I'm still trying to find the root cause, but it may have been introduced when I created the optimal_batch_size helper, which I've since refactored.

Can you compile from source and confirm the bug has been fixed on master on your end?

jqnatividad commented 1 month ago

Hi @tmtmtmtm , can you check if it still happens with 0.137.0?

jqnatividad commented 4 weeks ago

I'm glad to report that your initial test case now works as expected in 0.137.0 @tmtmtmtm !

Closing this issue.

tmtmtmtm commented 3 weeks ago

Thanks @jqnatividad - this is working fine on 0.137.0 for me too.

(BTW I've never been able to compile qsv from source: I end up in a twisty maze of dependency problems every time, that my rust-fu is entirely too poor to resolve. So mostly I need to wait for new releases to be able to take advantage of fixes / new features etc.)