Open pcmoritz opened 1 month ago
Makes sense, although in which practical cases pyarrow read json would fail here and imagine a fallback jsonL in python native way would work? @pcmoritz
And I wonder if we could use pandas read json -> pyarrow as the fallback instead of implementing a generic fallback in native python.
@pcmoritz Friendly ping for input.
What happened + What you expected to happen
See repro below -- I would have expected the fallback to parse the file as JSONL (and fail because it doesn't have the expected format)
Versions / Dependencies
Ray 2.38.0
Reproduction script
First create a file like
as
data.jsonl
and then runyou will find an error like
This is because the Python fallback doesn't support JSONL: https://github.com/ray-project/ray/blob/002908ff57e3d64c5fa580d264f7389f26167340/python/ray/data/_internal/datasource/json_datasource.py#L108
Issue Severity
Medium: It is a significant difficulty but I can work around it.