rapidsai / cuDataShader

Apache License 2.0
22 stars 5 forks source link

[BUG] nyc_taxi.csv dataset does not exist in provided data folder #1

Closed taureandyernv closed 4 years ago

taureandyernv commented 5 years ago

Describe the bug Just cloned, and for the cudatashader.ipynb, the nyc_taxi.csv dataset does not exist in the data folder Steps/Code to reproduce bug

import cudf
pdf = pd.read_csv('data/nyc_taxi.csv')

output:

FileNotFoundError: [Errno 2] File b'data/nyc_taxi.csv' does not exist: b'data/nyc_taxi.csv'

Expected behavior successful read of data/nyc_taxi.csv

exactlyallan commented 5 years ago

The .csv is 1.5GB, but you can find it in the /data folder of datashader via conda install datashader Open to suggestions on how to include it easier.

taureandyernv commented 5 years ago

Maybe an s3 bucket with a !wget that targets the download to the approrpiate folder may work?

exactlyallan commented 5 years ago

Do we have available s3 buckets for this? A hack temp solution is to use google drive maybe?

taureandyernv commented 5 years ago

A gift to you :) https://colab.research.google.com/drive/1bFIBg54zS9RmU58VwjJMAaqJ1xP27BXj. I linked directly from the taxi data's s3 bucket, and modified the colab so that runs everything all the way through.

link: https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2015-01.csv wget command: !wget -O data/nyc_taxi.csv https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2015-01.csv

Maybe you can make that a "Try Datashader" demo link on the readme or something