Open dagardner-nv opened 6 months ago
Debugging update: On further investigation using a cpp repro with the same input as the python repro above, both lines=True
and lines=False
case result in tokens StructBegin StructMemberBegin FieldBegin FieldEnd StringBegin
. When lines=False
, the exception is thrown after the device_json_column
constructed.
I think we need to add an additional condition for invalid lines in JSONL case.
Apart from the Error Token test that we have in the node tree algorithms, we also need to verify complete node levels i.e. ensure that Begin tokens have matching End tokens. Such a test can be achieved with device-side prefix sum operations on the tokens list, and should resolve this bug.
Describe the bug
cudf.read_json
doesn't raise an exception when parsing invalid json whenlines=True
andengine='cudf'
. Instead it returns a single row DF with an empty string value.Setting
lines=False
raises aRuntimeError
(should be aValueError
). Alternately settingengine='pandas'
raises aValueError
.Steps/Code to reproduce bug
Expected behavior A raised
ValueError
, although any exception is better thanEnvironment overview (please complete the following information)
Observed in versions 24.04.01 and 24.02.02