ropensci / bikedata

:bike: Extract data from public hire bicycle systems
https://docs.ropensci.org/bikedata
81 stars 16 forks source link

sketch for sf-bay-area #57

Closed tbuckl closed 6 years ago

tbuckl commented 6 years ago

added necessary lines to fetch and load sf data to db for trips. working on stations.

with these changes, you can do the following:

dl_bikedata (city = 'sf', data_dir = data_dir)
bikedb <- file.path (data_dir, 'sfdb')
store_bikedata (data_dir = data_dir, city = 'sf', bikedb = bikedb)
index_bikedata_db (bikedb = bikedb)

but not the following:

tm <- bike_tripmat(bikedb = bikedb, city='sf')

which failed because the stations data weren't in there.

so i added a function to fetch stations, deleted the database, and then ran the same above and got this error message

Error in rcpp_import_stn_df(bikedb, sdf, city = "sf") : 
  Unable to insert stations for sf

which seems to be thrown here or here

debugging C++ is not something i've done before, though i am excited to give it a shot!

i tried throwing zErrMsg instead. and it seemed to complain about some character escaping, maybe? i can't quite tell.

 Error in rcpp_import_stn_df(bikedb, sf_stns, "sf") : 
  near "Farrell": syntax error 
3.
stop(structure(list(message = "near \"Farrell\": syntax error", 
    call = rcpp_import_stn_df(bikedb, sf_stns, "sf"), cppstack = NULL), .Names = c("message", 
"call", "cppstack"), class = c("std::runtime_error", "C++Error", 
"error", "condition"))) 
2.
rcpp_import_stn_df(bikedb, sf_stns, "sf") at store-bikedata.R#158
1.
store_bikedata(data_dir = data_dir, city = "sf", bikedb = bikedb) 

so i tried throwing the records with O'Farrell out and then running again.

sorry i couldn't be more helpful. i'll give it a try again sometime this week.

tbuckl commented 6 years ago

@mpadge i just opened this PR to see the changes, so i'm gonna close it out.

mpadge commented 6 years ago

Thanks so much @tibbl35! That's fantastic work! Your PR is pretty much there, the only real problem - which you already identified - is the stations. The trip data for SF, as with other NABSA systems, include the station data in with the raw trip data files, so don't need a separate station data table to be read from anywhere. The station reading is instead linked in C++-world via the import_to_station_table function, using the stationqry set up during read_one_line_nabsa. The stations are then written following the reading of trip files right at the end of rcpp_import_to_trip_table.

In short: All that seems to be missing is the first stage listed in the current wiki of modifying the src/ files, and I think that, because these files should be structured identical to both "ph" and "la", all you'll need to do is extend any mention of those to include "sf" as well. That should be only in src/sqlit3db-add-data.h. You could just change that and re-open the same PR if you want.

Also please feel free to update the wiki if you thought any step was unclear, wrong, inaccurate, whatever. Thanks again for the sterling help here!

tbuckl commented 6 years ago

thanks @mpadge this is helpful ill give it a shot in the coming days and reissue the PR with the working version.

tbuckl commented 6 years ago

@mpadge looks like the SF bay area data do not include 'trip_id', which philadelphia and LA do.

i gather i need to change something in read_one_line_nabsa but not sure what.

i'll try to take a look over the next few days but thought you or others might find the info useful anyway.

mpadge commented 6 years ago

hey @tibbl35, I've implemented some pretty big changes, because the Boston Hubway system up and changed their data structure quite significantly. This means at the least that you'll have to merge all of those changes into your branch. It'd be great if you were still keen to help get SF in - please just let me know here, and I'll happily guide the process.

The file structure is identical to the current Boston format, including a change between 2017 and 2018. However, C++ code will still have to be specifically written for SF, because Boston now (very annoyingly) bundles their station data separately to match with their older data files. SF do not (yet) do this, making it easier. So the C++ code will be a combination of current Boston code plus generic NABSA code (inserting station data on the go as the file is read). Let me know, and I'll assign you to the official issue, and add you as a package co-author as incentive for you. (Simple rule: If you write new functionality, you are an author.)

mpadge commented 6 years ago

@tibbl35 FYI: Here's an all-in-one commit to add data for Minneapolis/St Paul

tbuckl commented 6 years ago

@mpadge definitely still interested. i'll give it a shot in the next 2 days and we'll know one way or the other whether i can help by end of day tomorrow.