Open miguelusque opened 2 years ago
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
If this is still and issue can you @miguelusque post and example file here ?
Describe the bug Hi, I have noticed a difference in performance when reading a jsonl file with cudf and dask_cudf.
In both cases, I will be using only 1 GPU.
I have the following files (see details below):
Please find below the execution time when I run them on a DGX1 v100 (16GBs):
The scripts content is as follows:
json_cudf.py
and
jsonl_dask_cudf.py
Steps/Code to reproduce bug Hi @shwina , as discussed in the Slack channel, I will send you an email with the link to the dataset used. Thanks!
Expected behavior Not such a huge difference in performance.
Environment overview (please complete the following information) DGX-A100, cuda 11.5, rapids 22.04