ropensci / bikedata

:bike: Extract data from public hire bicycle systems
https://docs.ropensci.org/bikedata
82 stars 17 forks source link

Vignette with example of uses #5

Closed richardellison closed 7 years ago

richardellison commented 7 years ago

I think it may be worth adding a vignette with some examples of possible analysis that can be done with the data. Either making use of other packages or more standard examples (possibly using some spatial queries).

In that theme, as promised, below is the code used to generate this image. The sum_network_links function is in ropensci/stplanr#185. The create_index argument of the store_bikedata function is in #3.

nycitibikeexample

library(rgdal)
library(stplanr)
library(rgeos)
library(dplyr)
library(RSQLite)
library(bikedata)

# Download and read in the New York State street layer (could use the NYC Tiger 
# database as well potentially). This will be used to create the network (later)
download.file("http://gis.ny.gov/gisdata/fileserver/?DSID=932&file=streets_shp.zip",
              destfile = "~/Downloads/streets_shp.zip")
unzip("~/Downloads/streets_shp.zip")
nystreets <- readOGR("~/Downloads/Streets_shp","StreetSegment")

# Create a directory to store the data files and then download Citibike data for
# October 2016 to December 2016
dir.create("~/Downloads/citibikedata")
dl_bikedata(data_dir = "~/Downloads/citibikedata/",
            dates = c("201610","201611","201612"))

# Store downloaded data into a database
store_bikedata("~/Downloads/citibikedata/", "citifq2016", create_index = TRUE)

# Connect to the database
dbcon <- dbConnect(SQLite(), "citifq2016")

# Retrieve stations from the database and create a SpatialPointsDataFrame 
# from the result.
nycstations <- dbGetQuery(dbcon, "SELECT * FROM stations")
nycstations$geom <- NULL
nycstations <- SpatialPointsDataFrame(coords = nycstations[,c('longitude','latitude')], 
                                      proj4string = CRS("+init=epsg:4326"), 
                                      data = nycstations)

# Reproject to the same projection as the streets layer
nycstations <- spTransform(nycstations, nycnet@sl@proj4string)

# Clip the streets layer to the area around the stations then 
# remove the full New York State dataset.
nycstreets <- gclip(nystreets, bbox(gBuffer(gEnvelope(nycstations, byid = FALSE),
                                            byid = FALSE,width=1000)))
rm(nystreets)

# Create a new network with the length parameter as the default weight
nycnet <- SpatialLinesNetwork(sl = nycstreets)

# Find the closest node to each station
nycstations@data$nodeid <- stplanr::find_network_nodes(
  nycnet, 
  nycstations@coords[,1], 
  nycstations@coords[,2]
)

# Query the database to count the number of trips between each pair of stations.
routetrips <- dbGetQuery(dbcon, "SELECT start_station_id, end_station_id, 
                         COUNT(*) as numtrips
                         FROM trips 
                         WHERE start_station_id <> end_station_id
                         GROUP BY start_station_id, end_station_id")

# Join the routetrips table to the nycstations layer to match the Node IDs
routetrips <- routetrips %>% 
  inner_join(
    nycstations@data %>%
      select(start_station_id = id, startnodeid = nodeid)
  ) %>%
  inner_join(
    nycstations@data %>%
      select(end_station_id = id, endnodeid = nodeid)
  ) %>%
  select(
    startnodeid,
    endnodeid,
    numtrips
  )

# Run the sum_network_links function to aggregate the number of trips
# on each part of the network.
# Note that since the default weight (length) has not been changed,
# this is the simple shortest path.
nycbicycleusage <- sum_network_links(nycnet, routetrips)

# Download and read in some layers to set the geographic context
download.file("https://www2.census.gov/geo/tiger/TIGER2016/AREAWATER/tl_2016_36061_areawater.zip",
              destfile = "~/Downloads/citibikedata/nycountyareawater.zip")
download.file("https://www2.census.gov/geo/tiger/TIGER2016/AREAWATER/tl_2016_34017_areawater.zip",
              destfile = "~/Downloads/citibikedata/njcountyareawater.zip")
unzip("~/Downloads/citibikedata/nycountyareawater.zip", exdir = "~/Downloads/citibikedata/")
unzip("~/Downloads/citibikedata/njcountyareawater.zip", exdir = "~/Downloads/citibikedata/")
nywater <- readOGR("~/Downloads/citibikedata","tl_2016_36061_areawater")
njwater <- readOGR("~/Downloads/citibikedata","tl_2016_34017_areawater")
nywater <- spTransform(nywater, nycbicycleusage@proj4string)
njwater <- spTransform(njwater, nycbicycleusage@proj4string)

# Plot the water and routes layers
tm_shape(nywater, is.master = FALSE) + 
  tm_fill(col="#000011") + 
tm_shape(njwater, is.master = FALSE) + 
  tm_fill(col="#000011") + 
tm_shape(nycbicycleusage, is.master=TRUE) + 
  tm_lines(col="numtrips", 
           lwd="numtrips", 
           title.col = "Number of trips",
           breaks = c(0,20000,40000,60000,80000,100000,Inf),
           legend.lwd.show = FALSE,
           scale = 2
          ) + 
  tm_layout(
    bg.color="black",
    legend.position = c("right","bottom"), 
    legend.bg.color = "white", 
    legend.bg.alpha = 0.5
  )

# Save resulting map.
save_tmap(filename = "citibikeexample.png")
mpadge commented 7 years ago

That's awesome Richard! For someone who said they wouldn't have much time, you've sure made a cracking start! I'll get straight back onto this tomorrow (including your PR). Thanks for the great work! I'll likely leave incorporation into an actual vignette on momentary pause awaiting CRAN appearance of osmdata. Once that's done, street lines can be obtained from there in a flash, with just two lines of code.

mpadge commented 7 years ago

@richardellison just a heads up for you: I hope to finish the package this week. Once I do, I would appreciate it is you could give it a half-decent look over to suggest any changes, improvements, whatever. I'll let you know when it's ready, and will be hoping you'll be able to devote some time later this week or early next week? We should then be able to move fairly rapidly on to an ropensci submit.

richardellison commented 7 years ago

Sounds reasonable to me. I will try to get a chance to have a look at it once you're done although this coming week will be rather busy.

Richard

On 04/04/17 12:37, mark padgham wrote:

@richardellison https://github.com/richardellison just a heads up for you: I hope to finish the package this week. Once I do, I would appreciate it is you could give it a half-decent look over to suggest any changes, improvements, whatever. I'll let you know when it's ready, and will be hoping you'll be able to devote some time later this week or early next week? We should then be able to move fairly rapidly on to an ropensci submit.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mpadge/bikedata/issues/5#issuecomment-291461699, or mute the thread https://github.com/notifications/unsubscribe-auth/AKIixYdjcfAtTUkEFK8THTnVZldp27NBks5rsh1ZgaJpZM4MfNu3.

mpadge commented 7 years ago

@richardellison heads-up part2: In terms of functionality, it's pretty much done now. Data can be loaded for all cities of which i am currently aware. Feel free to inspect and play and try to break it whenever you can find some time. The rest is just polishing, finishing vignette, extending tests, and #11 (I'm pretty sure #6 is also now sorted, but will confirm before closing).

And note that I've reverted to slightly slower C++ routines for London - search ''london'' in read_city_files.h. The only city that bucks the otherwise consistent pattern of strict comma delimits.

Any suggestions, modifications, improvements much appreciated. Thanks in advance!

mpadge commented 7 years ago

@richardellison with much gratitude for the above code: Vignette now has an equivalent version using osmdata code that helps make it considerably more compact. It's a great finish to the vignette and to the package - thanks! A nicer html version will be online soon enough, or just run make from the vignette directory and it'll open up a version in your browser of choice.

Note also that i did eventually succumb to bundling sqlite3 which of course immediately alleviated many problems i was fighting against. Still feels a bit like cheating, but hey, makes life easier. Submission is now definitely officially imminent. CRAN first then ropensci as soon as that's online - i'll ping you in the latter submission

mpadge commented 7 years ago

@richardellison vignette with graphic up via pkgdown here. Submitted to CRAN today; ropensci as soon as it's up.

richardellison commented 7 years ago

Excellent, well done and sorry for not being more involved lately. Good to have the package on CRAN.