rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.35k stars 890 forks source link

[FEA] Improve escape character and quotation character parsing in Json and CSV reader. #6305

Open rgsl888prabhu opened 4 years ago

rgsl888prabhu commented 4 years ago

Is your feature request related to a problem? Please describe. As of now, csv and json reader are post processing occurrences of escape character and quotation character once it parses complete string. https://github.com/rapidsai/cudf/blob/76e2e155ce6fe2194a2bb41aeca93b48a39a55c2/cpp/src/io/csv/reader_impl.cu#L375

Describe the solution you'd like We might be able to handle skipping/leaving those character while copying the data.

harrism commented 4 years ago

Can you clarify the title of this issue to indicate which cuIO reader and writer?

github-actions[bot] commented 3 years ago

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

karthikeyann commented 2 weeks ago

This issue is not relevant for JSON reader anymore. JSON reader string processing performance escape characters and also utf8 decoding properly. if CSV reader does not have this issue, this issue can be closed.