rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.31k stars 886 forks source link

[BUG] cudf.to_pandas doesn't handle correctly datetime[ms] #3206

Closed aucahuasi closed 4 years ago

aucahuasi commented 4 years ago

Describe the bug If the cudf has a column of type datetime[ms] and if I use cudf.to_pandas then I get the column but with an incorrect time resolution. In this case it seems it always returns datetime[ns]

Steps/Code to reproduce bug

  1. Load a csv file that contains a date64 column
  2. Print the cudf and you will see the column type is datetime64[ms] (Note here the ms: milliseconds)
  3. Use cudf.to_pandas and check the resulting dataframe, it will have the datetime column but with ns (nanoseconds)

Here the script that reproduce the issue:

import numpy as np
import pandas as pd
import cudf

file_path = "/opt/tpch_tables/orders.csv"

column_names = [ 'o_orderkey', 'o_custkey', 'o_orderstatus', 'o_totalprice', 'o_orderdate', 'o_orderpriority', 'o_clerk', 'o_shippriority', 'o_comment']

data_types = ["int64", "int32", "str", "float64", "date64", "str", "str", "str", "str"]

gdf = cudf.read_csv(file_path, delimiter = '|', names = column_names, dtype =  data_types)

print("Input CUDF:")
print(gdf.dtypes)

df = gdf.to_pandas()

print("------------------------------------")

print("Input Pandas Dataframe with bad datetime unit:")
print(df.dtypes)

Expected behavior to_pandas should return a dataframe with correct time resolution if the column is a datetime.

Environment overview (please complete the following information)

Environment details Please run and paste the output of the cudf/print_env.sh script here, to gather any other relevant environment details

Click here to see environment details

     **git***
     commit ccf3df32bcbceff903bf91aa96f6f5332a39f3a6 (grafted, HEAD -> branch-0.10, origin/branch-0.10)
     Author: Ray Douglass <3107146+raydouglass@users.noreply.github.com>
     Date:   Tue Oct 22 10:49:01 2019 -0400

     REL Merge pull request #3140 from revans2/rmm-0.10-fixes

     [REVIEW] Init RMM with no pooling at the beginning
     **git submodules***
     -b165e1fb11eeea64ccf95053e40f2424312599cc thirdparty/cub
     -63f644be44201467e3938d59ed9d89cc8725c35d thirdparty/jitify

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=16.04
     DISTRIB_CODENAME=xenial
     DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
     NAME="Ubuntu"
     VERSION="16.04.4 LTS (Xenial Xerus)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 16.04.4 LTS"
     VERSION_ID="16.04"
     HOME_URL="http://www.ubuntu.com/"
     SUPPORT_URL="http://help.ubuntu.com/"
     BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
     VERSION_CODENAME=xenial
     UBUNTU_CODENAME=xenial
     Linux pctabz 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Wed Oct 23 18:28:11 2019
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |===============================+======================+======================|
     |   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
     | N/A   47C    P8    N/A /  N/A |   1079MiB /  4042MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                       GPU Memory |
     |  GPU       PID   Type   Process name                             Usage      |
     |=============================================================================|
     |    0     17253      C   ...aconda/conda/envs/develop/bin/python3.7   135MiB |
     |    0     17277      C   ...aconda/conda/envs/develop/bin/python3.7   135MiB |
     |    0     21272      C   python                                       169MiB |
     |    0     32276      C   ./testing-libgdf                             313MiB |
     |    0     32531      C   ./testing-libgdf                             317MiB |
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:          x86_64
     CPU op-mode(s):        32-bit, 64-bit
     Byte Order:            Little Endian
     CPU(s):                8
     On-line CPU(s) list:   0-7
     Thread(s) per core:    2
     Core(s) per socket:    4
     Socket(s):             1
     NUMA node(s):          1
     Vendor ID:             GenuineIntel
     CPU family:            6
     Model:                 158
     Model name:            Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
     Stepping:              9
     CPU MHz:               2800.000
     CPU max MHz:           3800,0000
     CPU min MHz:           800,0000
     BogoMIPS:              5616.00
     Virtualization:        VT-x
     L1d cache:             32K
     L1i cache:             32K
     L2 cache:              256K
     L3 cache:              6144K
     NUMA node0 CPU(s):     0-7
     Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti retpoline intel_pt rsb_ctxsw tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

     ***CMake***
     /home/percy/Applications/anaconda/conda/envs/develop/bin/cmake
     cmake version 3.15.4

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
     Copyright (C) 2015 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***

     ***Python***
     /home/percy/Applications/anaconda/conda/envs/develop/bin/python
     Python 3.7.4

     ***Environment Variables***
     PATH                            : /home/percy/Applications/anaconda/conda/envs/develop/bin:/home/percy/Applications/anaconda/conda/condabin:/home/percy/Applications/gcloud/google-cloud-sdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/percy/Applications/docker-compose/current:/home/percy/Applications/kubectl:/home/percy/Applications/minikube:/home/percy/Applications/ctop:/home/percy/Applications/anaconda/conda/bin
     LD_LIBRARY_PATH                 :
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /home/percy/Applications/anaconda/conda/envs/develop
     PYTHON_PATH                     :

     ***conda packages***
     /home/percy/Applications/anaconda/conda/condabin/conda
     # packages in environment at /home/percy/Applications/anaconda/conda/envs/develop:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                        main
     arrow-cpp                 0.14.1           py37h5ac5442_4    conda-forge
     blazingdb-protocol        1.0                       dev_0    
     blazingsql-cli            1.0                       dev_0    
     blazingsql-toolchain      0.4.4                         0    blazingsql
     bokeh                     1.3.4                    py37_0    conda-forge
     boost-cpp                 1.70.0               h8e57a91_2    conda-forge
     brotli                    1.0.7             he1b5a44_1000    conda-forge
     bzip2                     1.0.8                h516909a_1    conda-forge
     c-ares                    1.15.0            h516909a_1001    conda-forge
     ca-certificates           2019.9.11            hecc5488_0    conda-forge
     certifi                   2019.9.11                py37_0    conda-forge
     chardet                   3.0.4                    pypi_0    pypi
     click                     7.0                        py_0    conda-forge
     cloudpickle               1.2.2                      py_0    conda-forge
     cmake                     3.15.4               hf94ab9c_0    conda-forge
     cppzmq                    4.4.1                hc9558a2_0    conda-forge
     cudatoolkit               10.0.130                      0
     cudf                      0.10.0                   py37_0    rapidsai/label/cuda10.0
     curl                      7.65.3               hbc83047_0
     cython                    0.29.13          py37he1b5a44_0    conda-forge
     cytoolz                   0.10.0           py37h516909a_0    conda-forge
     dask                      2.6.0                      py_0    conda-forge
     dask-core                 2.6.0                      py_0    conda-forge
     dask-cudf                 0.10.0                   py37_0    rapidsai/label/cuda10.0
     distributed               2.6.0                      py_0    conda-forge
     dlpack                    0.2                  he1b5a44_1    conda-forge
     double-conversion         3.1.5                he1b5a44_1    conda-forge
     et-xmlfile                1.0.1                    pypi_0    pypi
     expat                     2.2.5             he1b5a44_1004    conda-forge
     fastavro                  0.22.5           py37h516909a_0    conda-forge
     flatbuffers               1.11                     pypi_0    pypi
     freetype                  2.10.0               he983fc9_1    conda-forge
     fsspec                    0.5.2                      py_0    conda-forge
     gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
     gflags                    2.2.2             he1b5a44_1001    conda-forge
     gitdb2                    2.0.6                    pypi_0    pypi
     gitpython                 3.0.3                    pypi_0    pypi
     glog                      0.4.0                he1b5a44_1    conda-forge
     gmock                     1.10.0                        0    conda-forge
     grpc-cpp                  1.23.0               h18db393_0    conda-forge
     gtest                     1.10.0               hc9558a2_0    conda-forge
     heapdict                  1.0.1                      py_0    conda-forge
     icu                       64.2                 he1b5a44_1    conda-forge
     idna                      2.8                      pypi_0    pypi
     jdcal                     1.4.1                    pypi_0    pypi
     jinja2                    2.10.3                     py_0    conda-forge
     jpeg                      9c                h14c3975_1001    conda-forge
     krb5                      1.16.1               h173b8e3_7
     libblas                   3.8.0               14_openblas    conda-forge
     libcblas                  3.8.0               14_openblas    conda-forge
     libcudf                   0.10.0               cuda10.0_0    rapidsai/label/cuda10.0
     libcurl                   7.65.3               h20c2e04_0
     libedit                   3.1.20181209         hc058e9b_0
     libevent                  2.1.10               h72c5cf5_0    conda-forge
     libffi                    3.2.1                hd88cf55_4
     libgcc-ng                 9.1.0                hdf63c60_0
     libgcrypt                 1.8.4             hf484d3e_1000    conda-forge
     libgfortran-ng            7.3.0                hdf63c60_2    conda-forge
     libgpg-error              1.36                 he1b5a44_0    conda-forge
     libgsasl                  1.8.0                         2    conda-forge
     liblapack                 3.8.0               14_openblas    conda-forge
     libntlm                   1.4               h14c3975_1002    conda-forge
     libnvstrings              0.10.0               cuda10.0_0    rapidsai/label/cuda10.0
     libopenblas               0.3.7                h6e990d7_2    conda-forge
     libpng                    1.6.37               hed695b0_0    conda-forge
     libprotobuf               3.8.0                h8b12597_0    conda-forge
     librmm                    0.10.0               cuda10.0_0    rapidsai/label/cuda10.0
     libsodium                 1.0.17               h516909a_0    conda-forge
     libssh2                   1.8.2                h22169c7_2    conda-forge
     libstdcxx-ng              9.1.0                hdf63c60_0
     libtiff                   4.0.10            h57b8799_1003    conda-forge
     libuv                     1.33.1               h516909a_0    conda-forge
     llvmlite                  0.29.0           py37hfd453ef_1    conda-forge
     locket                    0.2.0                      py_2    conda-forge
     lz4-c                     1.8.3             he1b5a44_1001    conda-forge
     markupsafe                1.1.1            py37h14c3975_0    conda-forge
     maven                     3.6.0                         0    conda-forge
     msgpack-python            0.6.2            py37hc9558a2_0    conda-forge
     ncurses                   6.1                  he6710b0_1
     numba                     0.45.1           py37hb3f55d8_0    conda-forge
     numpy                     1.17.2           py37h95a1406_0    conda-forge
     nvstrings                 0.10.0                   py37_0    rapidsai/label/cuda10.0
     olefile                   0.46                       py_0    conda-forge
     openjdk                   8.0.192           h14c3975_1003    conda-forge
     openpyxl                  3.0.0                    pypi_0    pypi
     openssl                   1.1.1c               h516909a_0    conda-forge
     packaging                 19.2                       py_0    conda-forge
     pandas                    0.24.2           py37hb3f55d8_0    conda-forge
     parquet-cpp               1.5.1                         2    conda-forge
     partd                     1.0.0                      py_0    conda-forge
     pillow                    6.2.0            py37h34e0f95_0
     pip                       19.2.3                   py37_0
     psutil                    5.6.3            py37h516909a_0    conda-forge
     py4j                      0.10.7                     py_1    conda-forge
     pyarrow                   0.14.1           py37h8b68381_2    conda-forge
     pyblazing                 0.1                       dev_0    
     pydrill                   0.3.4                    pypi_0    pypi
     pymysql                   0.9.3                    pypi_0    pypi
     pynvml                    8.0.3                    pypi_0    pypi
     pyparsing                 2.4.2                      py_0    conda-forge
     pyspark                   2.4.3                      py_0    conda-forge
     python                    3.7.4                h265db76_1
     python-dateutil           2.8.0                      py_0    conda-forge
     pytz                      2019.3                     py_0    conda-forge
     pyyaml                    5.1.2            py37h516909a_0    conda-forge
     rapidjson                 1.1.0             he1b5a44_1002    conda-forge
     re2                       2019.09.01           he1b5a44_0    conda-forge
     readline                  7.0                  h7b6447c_5
     requests                  2.22.0                   pypi_0    pypi
     rhash                     1.3.6             h14c3975_1001    conda-forge
     rmm                       0.10.0                   py37_0    rapidsai/label/cuda10.0
     setuptools                41.4.0                   py37_0
     six                       1.12.0                py37_1000    conda-forge
     smmap2                    2.0.5                    pypi_0    pypi
     snappy                    1.1.7             he1b5a44_1002    conda-forge
     sortedcontainers          2.1.0                      py_0    conda-forge
     sqlite                    3.30.0               h7b6447c_0
     tblib                     1.4.0                      py_0    conda-forge
     thrift-cpp                0.12.0            hf3afdfd_1004    conda-forge
     tk                        8.6.8                hbc83047_0
     toolz                     0.10.0                     py_0    conda-forge
     tornado                   6.0.3            py37h516909a_0    conda-forge
     uriparser                 0.9.3                he1b5a44_1    conda-forge
     urllib3                   1.25.6                   pypi_0    pypi
     wheel                     0.33.6                   py37_0
     xz                        5.2.4                h14c3975_4
     yaml                      0.1.7             h14c3975_1001    conda-forge
     zeromq                    4.3.2                he1b5a44_2    conda-forge
     zict                      1.0.0                      py_0    conda-forge
     zlib                      1.2.11               h7b6447c_3
     zstd                      1.4.0                h3b9ef0a_0    conda-forge

Additional context This issue also affects dask-cudf when the dask work want to process a cudf with type datetime.

kkraus14 commented 4 years ago

This is a limitation of Pandas only supporting datetime64[ns] and not a bug.