rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.42k stars 902 forks source link

[BUG] cudf.read_json error from nested data is unclear #5371

Closed vinayakbhadage closed 2 years ago

vinayakbhadage commented 4 years ago

Describe the bug cudf failed to read json file.

Steps/Code to reproduce bug

  1. Creating json file using pandas:
    
    path_df="./test_df_1.json"
    orient="records"
    compression=None
    pdf.to_json("./test_df_1.json",orient="records",compression=None,index=True)

got_df = cudf.read_json(path_df, orient=orient, compression=compression)



**Expected behavior**
expecting the dataframe but received the following error:

`NotImplementedError: struct<altId: string, audiences: list<item: string>, body: string, firstCreated: string, headline: string, id: string, instancesOf: list<item: string>, language: string, mimeType: string, provider: string, pubStatus: string, subjects: list<item: string>, takeSequence: int64, urgency: int64, versionCreated: string>`

**Environment overview (please complete the following information)**
 - Environment location: Azure Kubernetics Service 
 - Method of cuDF install: Docker + helm chart
  https://github.com/rapidsai/helm-chart

**Environment details**
rapids helm chart is installed in azure AKS using rapidsai docker image. 
`cudf.__version__`
is '0.13.0'

**Additional context**
Add any other context about the problem here.
kkraus14 commented 4 years ago

It looks like you have nested data in your JSON which is not yet supported in cuDF. We should improve the error messaging here though.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

github-actions[bot] commented 3 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

benfred commented 3 years ago

see also https://github.com/rapidsai/cudf/issues/2362

GregoryKimball commented 2 years ago

Closing in favor of #8827