ropensci / bikedata

:bike: Extract data from public hire bicycle systems
https://docs.ropensci.org/bikedata
81 stars 16 forks source link

Guadalaraja data breaks with 2021-06 data #106

Closed RichardBean closed 2 years ago

RichardBean commented 2 years ago

download_bikedata("gu", data_dir = tempdir(), dates = NULL, quiet = FALSE) store_bikedata('bikedb',"gu")

reading file 79/85: C:\Users.....\AppData\Local\Temp.../datos_abiertos_2021_06.csv 0sError in rcpp_import_to_trip_table(bikedb, flists$flist_csv, ci, header_file_name(), : basic_string::_M_construct null not valid

mpadge commented 2 years ago

Thanks @RichardBean, i also get exactly that error. Hope to fix soon.

mpadge commented 2 years ago

It's because they've changed the format of the files - up to 2021-05 the data were non-quoted, and from that time on, all fields are embedded in double-quotation marks. Should be easy to fix.

mpadge commented 2 years ago

@RichardBean Should now be fixed, but you will likely have to download your data again. There are in-built mechanisms in the C++ code to deal with these kinds of cases, but they make reading the files in much slower. Instead, I implemented a work-around here to re-format the files as soon as they're downloaded, so that reading into the database remains as fast as possible. That just means you will likely have to re-download the files yourself, or else you could just copy these lines and run them on the local versions of your files:

https://github.com/ropensci/bikedata/blob/330faa4f5060b5aae232fde9a88936a91b67bdcf/R/dl-bikedata.R#L158-L168

Here's a reproducible example of what you should now see:

library (bikedata)
#> Data for London, U.K. powered by TfL Open Data:
#>   Contains OS data Ⓒ Crown copyright and database rights 2016
#> Data for New York City provided and owned by:
#>   NYC Bike Share, LLC and Jersey City Bike Share, LLC ("Bikeshare")
#>   see https://www.citibikenyc.com/data-sharing-policy
#> Data for Washington DC (Captialbikeshare), Chiago (Divvybikes) and Boston (Hubway)
#>   provided and owned by Motivate International Inc.
#>   see https://www.capitalbikeshare.com/data-license-agreement
#>   and https://www.divvybikes.com/data-license-agreement
#>   and https://www.thehubway.com/data-license-agreement
#> Nice Ride Minnesota license  https://assets.niceridemn.com/data-license-agreement.html
packageVersion ("bikedata")
#> [1] '0.2.5.43'

city <- 'gu'
data_dir <- "/<path>/<to>/<gu-data>"
dl_bikedata (city = city, data_dir = data_dir, dates = 2021, quiet = FALSE)
#> All data files already exist
bikedb <- file.path (tempdir (), "bikedb.sqlite")

store_bikedata (bikedb = bikedb, data_dir = data_dir, quiet = FALSE)
#> Creating sqlite3 database
#> Unzipping raw data files ...
#> reading file 1/12: /data/data/bikes/gu/datos_abiertos_2021_01.csv
#> reading file 2/12: /data/data/bikes/gu/datos_abiertos_2021_02.csv
#> reading file 3/12: /data/data/bikes/gu/datos_abiertos_2021_03.csv
#> reading file 4/12: /data/data/bikes/gu/datos_abiertos_2021_04.csv
#> reading file 5/12: /data/data/bikes/gu/datos_abiertos_2021_05.csv
#> reading file 6/12: /data/data/bikes/gu/datos_abiertos_2021_06.csv
#> reading file 7/12: /data/data/bikes/gu/datos_abiertos_2021_07.csv
#> reading file 8/12: /data/data/bikes/gu/datos_abiertos_2021_08.csv
#> reading file 9/12: /data/data/bikes/gu/datos_abiertos_2021_09.csv
#> reading file 10/12: /data/data/bikes/gu/datos_abiertos_2021_10.csv
#> reading file 11/12: /data/data/bikes/gu/datos_abiertos_2021_11.csv
#> reading file 12/12: /data/data/bikes/gu/datos_abiertos_2021_12.csv
#> Total trips read = 3,184,410
#> [1] 3184410

Created on 2022-02-01 by the reprex package (v2.0.1.9000)