Open ayushdg opened 1 week ago
On further investigation, this bug occurs due to an under-estimate in the size of the device buffer required to store the uncompressed data. Proposed solution: (i) Get estimate of uncompressed buffer size (fallback to heuristic if computing such an estimate is expensive) and (ii) Use realloc-and-retry logic from #16687 if the estimate falls short. We can extend this logic to multi-source compressed inputs as well.
Describe the bug cudf.read_json fails on a specific file in my dataset
Steps/Code to reproduce bug
Expected behavior
Environment overview (please complete the following information)
docker pull
&docker run
commands usedEnvironment details cudf 24.08, 24.12 (nightly) [ haven't checked with 24.10 but given 08, and 12 both fail I suspect the issue applies)
Additional context Data here: 2022-33_1303_en_all.json.gz