Open rgsl888prabhu opened 4 years ago
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
This still fails, but it does so differently now in dtype detection of the timestamp type (also the signature of read_json
has changed subtly, the dtype needs to be a dict now):
In [6]: >>> import cudf
...: >>> import pandas as pd
...: >>> pdf = pd.DataFrame({"a":[45461150050, 55414521000, 4544624522000, 4546345758000, 45445254600]}, dtype='datetime64[ms]')
...: >>> buffer = pdf.to_json(compression='infer', lines=True, orient="records")
...: >>> buffer
...: '{"a":45461}\n{"a":55414}\n{"a":4544624}\n{"a":4546345}\n{"a":45445}'
...: >>> df = cudf.read_json(buffer, compression='infer', lines=True, orient="records", dtype={"a": 'timestamp[ms]'})
...: >>> df
...
File ~/.conda/envs/rapids/lib/python3.10/site-packages/pandas/core/dtypes/common.py:1645, in pandas_dtype(dtype)
1640 with warnings.catch_warnings():
1641 # GH#51523 - Series.astype(np.integer) doesn't show
1642 # numpy deprecation warning of np.integer
1643 # Hence enabling DeprecationWarning
1644 warnings.simplefilter("always", DeprecationWarning)
-> 1645 npdtype = np.dtype(dtype)
1646 except SyntaxError as err:
1647 # np.dtype uses `eval` which can raise SyntaxError
1648 raise TypeError(f"data type '{dtype}' not understood") from err
TypeError: data type 'timestamp[ms]' not understood
Describe the bug
cudf.read_json
is failing to parse DateTime64 typed columns correctly when expected dtype is provided.Steps/Code to reproduce bug
If
dtype
isn't specified, and if we cast the resulting int64 column, we get expected resultExpected behavior
cudf.read_json
should handle dtype arguement.Environment overview (please complete the following information)