Open lithomas1 opened 4 months ago
Compute-sanitizer:
========= COMPUTE-SANITIZER
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaMemcpyAsync.
========= Saved host backtrace up to driver entry point at error
========= Host Frame: [0x445b06]
========= in /usr/lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame:cudaMemcpyAsync [0x6dabf]
========= in /home/coder/.conda/envs/rapids/lib/libcudart.so.12
========= Host Frame:cudf::io::json::detail::ingest_raw_input(cudf::device_span<char, 18446744073709551615ul>, cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::compression_type, unsigned long, unsigned long, rmm::cuda_stream_view) [0x1decfdf]
========= in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
========= Host Frame:cudf::io::json::detail::get_record_range_raw_input(cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::json_reader_options const&, rmm::cuda_stream_view) [0x1dee514]
========= in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
========= Host Frame:cudf::io::json::detail::read_batch(cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::json_reader_options const&, rmm::cuda_stream_view, cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible>) [0x1deeb75]
========= in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
========= Host Frame:cudf::io::json::detail::read_json(cudf::host_span<std::unique_ptr<cudf::io::datasource, std::default_delete<cudf::io::datasource> >, 18446744073709551615ul>, cudf::io::json_reader_options const&, rmm::cuda_stream_view, cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible>) [0x1df030a]
========= in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
========= Host Frame:cudf::io::read_json(cudf::io::json_reader_options, rmm::cuda_stream_view, cuda::mr::__4::basic_resource_ref<(cuda::mr::__4::_AllocType)1, cuda::mr::__4::device_accessible>) [0x1d2de18]
========= in /home/coder/cudf/cpp/build/conda/cuda-12.2/release/libcudf.so
Thank you @lithomas1 for sharing issue. We haven't done much testing with compressed JSON inputs. There could be a straightforward solution here, and we will take a closer look as soon as we can.
Describe the bug A clear and concise description of what the bug is.
The libcudf JSON reader is "crashing" (not sure if its technically a crash, but I'm getting a CUDA error)
Steps/Code to reproduce bug Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
Expected behavior A clear and concise description of what you expected to happen.
Environment overview (please complete the following information)
docker pull
&docker run
commands usedSuccessful read, like with pandas.
Environment details Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsMy cudf is the latest cudf (from main).
Additional context
I think the issue might be with the specific data values (they are all integers, even the string/floating columns). I'm pretty sure libcudf can write all the data types (even the nested struct/list ones).
baddf.json.gz
Also, if you uncompress the file by hand, you are able to read it with cudf