ropensci / bikedata

:bike: Extract data from public hire bicycle systems
https://docs.ropensci.org/bikedata
81 stars 16 forks source link

Unable to store NYC data from 2018 and 2019 #96

Closed szymanskir closed 4 years ago

szymanskir commented 4 years ago

I am having trouble populating the sqlite database with NYC bike data from 2018 and 2019.

library(bikedata)

temp <- tempdir()
dl_bikedata(city = "nyc", data_dir = temp, dates = 201808)
store_bikedata(bikedb = file.path(temp, "bikedb.sqlite"), data_dir = temp)

Results in the following error:

Creating sqlite3 database
Unzipping raw data files ...
reading file 1/2: /tmp/RtmpaSbJke/201808-citibike-tripdata.csv
Error in rcpp_import_to_trip_table(bikedb, flists$flist_csv, ci, header_file_name(),  : 
  basic_string::_M_construct null not valid

I have tried investigating the issue and it seems like the file from august 2018 is the first one where some missing station ids are marked as NULL e.g.

"tripduration","starttime","stoptime","start station id","start station name","start station latitude","start station longitude","end station id","end station name","end station latitude","end station longitude","bikeid","usertype","birth year","gender"
5689,"2018-08-31 23:57:30.7940","2018-09-01 01:32:20.5120",NULL,NULL,40.866,-73.884,NULL,NULL,40.854,-73.911,34616,"Customer",1995,2

When I manually removed lines containing NULL values, no error occured :)