ropensci / software-review

rOpenSci Software Peer Review.
291 stars 104 forks source link

gtfsr package #55

Closed dantonnoriega closed 7 years ago

dantonnoriega commented 8 years ago
  1. What does this package do? (explain in 50 words or less)

    Currently, the package facilitates the import of GTFS data (from url or local path) to create gtfs objects, validates file structure of gtfs objects, and provides functions to easily plot stops, routes, or networks.

    Future versions will build validation and modeling capabilities, and options to layer other data.

  2. Paste the full DESCRIPTION file inside a code block below.
Package: gtfsr
Type: Package
Title: Working with GTFS (General Transit Feed Specification) feeds in R
Version: 0.1.0
Authors@R: as.person(c(
    "Elaine McVey <elaine.mcvey@transloc.com> [aut, cre]",
    "Danton Noriega-Goodwin <danton@transloc.com> [aut]"
  ))
Description: Provides API wrappers for popular public GTFS feed sharing sites, reads feed data into a gtfs data object, validates data quality, provides convenience functions for common tasks.
License: GPL
LazyData: TRUE
Imports:
    dplyr,
    readr,
    httr,
    magrittr,
    stringr,
    assertthat,
    leaflet,
    sp,
    rgeos,
    scales
Suggests:
    testthat,
    knitr,
    rmarkdown
RoxygenNote: 5.0.1
VignetteBuilder: knitr
  1. URL for the package (the development repository, not a stylized html page)

    https://github.com/ropenscilabs/gtfsr

  2. What data source(s) does it work with (if applicable)?

    It is designed to work with any text files following the GTFS format. It does not come with any accompanying data.

  3. Who is the target audience?

    Transportation Economists, Transportation Consultants and Researchers, Urban Planners, Open Data Enthusiasts

  4. Are there other R packages that accomplish the same thing? If so, what is different about yours?

    None

  5. Check the box next to each policy below, confirming that you agree. These are mandatory.
    • [x] This package does not violate the Terms of Service of any service it interacts with.
    • [x] The repository has continuous integration with Travis CI and/or another service
    • [x] The package contains a vignette
    • [x] The package contains a reasonably complete README with instructions for installing the development version.
    • [x] The package contains unit tests
    • [x] The package only exports functions to the NAMESPACE that are intended for end users
  6. Do you agree to follow the rOpenSci packaging guidelines? These aren't mandatory, but we strongly suggest you follow them. If you disagree with anything, please explain.
    • [NO] Are there any package dependencies not on CRAN?
    • [YES] Do you intend for this package to go on CRAN?
    • [NO] Does the package have a CRAN accepted license?
    • [YES] Did R CMD check (or devtools::check()) produce any errors or warnings? If so paste them below.
R CMD check results
0 errors | 0 warnings | 1 note 
checking R code for possible problems ... NOTE
map_gtfs_agency_routes: warning in get_agency_stops(gtfs_obj, agency_id
  = agency): partial argument match of 'agency_id' to 'agency_ids'
map_gtfs_route_shape: warning in get_agency_stops(gtfs_obj, agency_id =
  agencies): partial argument match of 'agency_id' to 'agency_ids'
filter_feedlist: no visible binding for global variable ‘.’
get_agency_stops: no visible binding for global variable ‘agency_id’
get_agency_stops: no visible binding for global variable ‘route_id’
get_agency_stops: no visible binding for global variable ‘trip_id’
get_agency_stops: no visible binding for global variable ‘stop_id’
... 62 lines ...
  ‘file_provided_status’
validate_vars_provided: no visible binding for global variable
  ‘field_provided_status’
validate_vars_provided: no visible binding for global variable
  ‘validation_status’
Undefined global functions or variables:
  . agency_id color field field_provided_status field_spec
  file_provided_status file_spec n opacity popups prob_subset route_id
  route_short_name service_id shape_id shape_pt_lat shape_pt_lon
  shape_pt_sequence spec stop_id stop_lat stop_lon stop_name trip_id
  validation_details validation_status
  1. Please add explanations below for any exceptions to the above: NA
  2. If this is a resubmission following rejection, please explain the change in circumstances. NA
sckott commented 8 years ago

Passes all editors' checks, is a good fit

Seeking reviewers...

sckott commented 8 years ago

Reviewers: @Robinlovelace Due date: 2016-07-18

sckott commented 8 years ago

@Robinlovelace Due date: 2016-07-18 - hey there, it's been 37 days, please get your review in soon, thanks :smiley_cat: (ropensci-bot)

Robinlovelace commented 8 years ago

Thanks for the reminder and opportunity - here is my review of the package:

Introduction

GTFS is becoming the worldwide standard for sharing information about public transport systems. This makes it hugely important for the transport industry, public sector organisations trying to direct such systems in socially beneficial directions, and for researchers who have a range of motivations including the need to transition away from fossil fuels to save the world.

In any case, GTFS data is largely out of reach for the humble citizen like you and me. Googling "software gtfs data" reveals a cottage industry of GTFS editors and viewers, but little in the way of open source software for accessing this vital new file format. A GIS StackExchange question further shows the lack of cohesion around a single open source product (the World Bank provides a more up-to-date list of tools that can interact with GTFS data).

R is a powerful language that provides an emerging set of transport tools via packages such as stplanr, so rOpenSci is the ideal place for the development of a new package to make GTFS available to the world, in the spirit of citizen science.

Basic functionality

The package is installed and loaded easily with:

devtools::install_github('ropenscilabs/gtfsr')
library(gtfsr)

The most common thing that someone will want to use the package for, I imagine, is read in a GTFS file. This can be done easily and, based on a few tests of feeds from transit.land, reliably:

# this one failed for some reason
u = "https://www.bjcta.org/wp-content/uploads/2016/05/BJCTA_GTFS_0516.zip"
# the example in the README works
u = "http://www.co.fairbanks.ak.us/transportation/MACSDocuments/GTFS.zip"
# as does this giant one from Rio de Janeiro
u = "http://dadosabertos2.rio.rj.gov.br/dadoaberto/google-transit/google_transit.zip"
# as does this one from the USA
u = "http://www.bart.gov/dev/schedules/google_transit.zip"
gtfs_obj = import_gtfs(u)

The next logical thing you would want to do is plot the result. (Suggestion: make simply loading and plotting GTFS data appear earlier and more prominently in the README.)

This can be done easily with the sensible but clunkily-named map_gtfs_agency_routes(). Suggestion: rename this verbosely name function or create a new generic plotting function called simply map_gtfs():

map_gtfs_agency_routes(gtfs_obj)
map_gtfs_agency_routes(gtfs_obj, include_stops = TRUE)

I think in most cases users will want to see the stops. Suggestion: make include_stops = TRUE the default option in this function and the yet-to-be created generic map_gtfs().

Accessing GTFS feeds

A useful feature of the package, that goes above and beyond other GTFS software, is its ability to search for public GTFS feeds, using the transitfeeds.com API key. The guidance on how to get a key in the README is good. However, advice for storing the key could be better: the set_api_key() key works fine for one session but it seems the key must be entered afresh every for every new R session. Suggestion: recommend setting the key in .Renviron, as documented in the httr vignette, e.g. with the line GTFS_API_KEY=XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX. Then it could be automatically retrieved with each new session by adding something like the following lines to get_api_key():

if(grepl("[[:alnum:]]{8}\\-[[:alnum:]]{4}\\-[[:alnum:]]{4}\\-[[:alnum:]]{4}\\-[[:alnum:]]{12}", Sys.getenv("GTFS_API_KEY")))
  gtfs_api_key$set(Sys.getenv("GTFS_API_KEY"))

We use a similar technique in stplanr with cyclestreet_pat, which we should probably generalise to other API keys...

In any case, the most important thing is that users can quickly view valid gtfs feeds:

feedlist_df = filter_feedlist(get_feedlist()) # errors:
# Error in function_list[[k]](value) : could not find function "mutate"

Note that the above code errors, probably because the function mutate is missing a dplyr:: prefix in the definition of get_feedlist. (Suggestion: Fix this issue).

After dplyr has been loaded, the function works:

library(dplyr)
feedlist_df = filter_feedlist(get_feedlist())

As per the documentation, you can then find feeds from a specific country, e.g. Spain:

u_spain = filter(feedlist_df, grepl("Spain", loc_t)) %>% 
  select(url_d)
# get all gtfs feeds from spain
gtfs_spain = import_gtfs(u_spain)

Of the 3 imported GTFS feeds, only 1 can be plotted:

map_gtfs_agency_routes(gtfs_spain[[1]]) # fails - not a gtfs object
map_gtfs_agency_routes(gtfs_spain[[2]]) # works
map_gtfs_agency_routes(gtfs_spain[[3]]) # fails - no route_id

I think that the fact that only 1 of these Spanish GTFS datasets works says more about the variable quality of GTFS data than the package. However, such testing on a wide range of international examples could help further refine the package and make it more accepting of diverse feed formats. Suggestion: test gtfsr on a wide range of datasets, perhaps using continuous integration, although not sure how this would work with its reliance on external datasets - any ideas?

Data consistency issues aside (which are clearly not the fault of gtfsr), the package clearly does a great job of making GTFS data more accessible to the masses.

Compatibility with spatial classes

To increase the utility of GTFS objects, I think it would be fantastic if the package included functions to export them as spatial objects. That would expose them to the power of R's impressive GIS capabilities. Suggestion: explore options for converting gtfs data classes into spatial objects such as SpatialLinesDataFrame.

Note: stplanr has a basic function for this: gtfs2sldf().

Documentation

The package's vignette is currently the same as the README. This is useful for now but in the long-run the vignettes should be self-standing. Suggestion: make the README shorter and self-standing, make the vignette longer with more links to existing software and documentation for understanding and working with GTFS data.

Minor issues

The first thing I noticed when cloning this repo was its massive size (44 MB). This is predominently due to previously deleted files in the .git repo. It won't cause issues when installing it but will be a deterrent to potential developers. Suggestion: removing the culprits, e.g. using the BFG repo cleaner.

Robinlovelace commented 8 years ago

An additional comment, more one for the longer term, is how can this interact with routing? Are there plans in future versions of gtfsr to allow journey planning, e.g. via an interface to something like OpenTripPlanner? This could help the GTFS data be used for analysis of spatio-temporal accessibility.

sckott commented 8 years ago

Thanks for the review @Robinlovelace !

sckott commented 8 years ago

@ultinomics review is in, let us know if you need any help

dantonnoriega commented 8 years ago

Will do! I plan to go through the edits during the first week of September.

On Aug 17, 2016 3:02 PM, "Scott Chamberlain" notifications@github.com wrote:

@ultinomics https://github.com/ultinomics review is in, let us know if you need any help

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/55#issuecomment-240513439, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaR2d-SgPKf5R2alC5VJkhoPrh4KwSaks5qg1rBgaJpZM4I-MZq .

sckott commented 8 years ago

okay

dantonnoriega commented 7 years ago

Update: I will be working on this during the last week of September during TransLoc internal "dev days". Thanks everyone and sorry for dragging my feet!

sckott commented 7 years ago

thanks for the update

dantonnoriega commented 7 years ago

I've gone through most of the review (thanks, @Robinlovelace!) but still need to update the README and vignette and learn how to save the API key to .Renviron. I've also gone through most all the issues that I considered bugs. You can see what I have to work through still on the issues page.

Otherwise, the most pressing suggestions have been implemented and are currently on a branch called map-update! Once I update the README and vignette, I will merge and push to master. I'll work on the API storing thing separately since it is not critical to getting the package to work.

Let me know what's next! I really pumped about the new, updated mapping feature but would be great to get folks testing it!

Robinlovelace commented 7 years ago

Great work - not had a chance to test it but looking forward to that.

sckott commented 7 years ago

@ultinomics thanks for the changes. I'll try it out the new map stuff.

Let me know what's next!

In addition to readme and vignette changes ... The API key change thing is important to get done as it's a security issue. Users of this pkg will possibly unintentionally leave their API key in an R script and then put their code up on the web (e.g., on github) for all the world to see. If they store as an env var, that won't happen

dantonnoriega commented 7 years ago

I'll get in it! Good point!

On Oct 6, 2016 3:32 PM, "Scott Chamberlain" notifications@github.com wrote:

@ultinomics https://github.com/ultinomics thanks for the changes. I'll try it out the new map stuff.

Let me know what's next!

In addition to readme and vignette changes ... The API key change thing is important to get done as it's a security issue. Users of this pkg will possibly unintentionally leave their API key in an R script and then put their code up on the web (e.g., on github) for all the world to see. If they store as an env var, that won't happen

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/55#issuecomment-252065293, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaR2UH8-kk4El-g3a_wiBjYHyikQuO8ks5qxUzggaJpZM4I-MZq .

sckott commented 7 years ago

@ultinomics i was going to test out but the map-update branch isn't there, was it merged to master?

dantonnoriega commented 7 years ago

@sckott it was! did a few edits the day after but forgot to update ya.

sckott commented 7 years ago

Noticed a few more things:

sckott commented 7 years ago

@ultinomics any update on your work on this package?

dantonnoriega commented 7 years ago

It is on my to do list for after Dec 2nd. Unfortunately, the last few weeks have been consumed by PhD work.

I’m reserving the package as my “fun thing to do over winter break”. I plan on tackling the API key problem next.

On Nov 26, 2016, at 6:28 PM, Scott Chamberlain notifications@github.com wrote:

@ultinomics https://github.com/ultinomics any update on your work on this package?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/55#issuecomment-263092244, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaR2Rni4pkpzB9MQ445biSlfEj3IFA6ks5rCMCzgaJpZM4I-MZq.

dantonnoriega commented 7 years ago

@sckott My bad on close/comment.

Alright! I added the API .Renviron functionality and the ?gtfsr manual file and the warnings thing.

Anything else I can do? I know I need a lot more examples still but I hope it's good enough.

sckott commented 7 years ago

no worries

sckott commented 7 years ago

@Robinlovelace any other thoughts at this point?

I'm having a last look now

Robinlovelace commented 7 years ago

No more thoughts from me Scott (although I must confess that this is partly due to lack of time) - it looks like a well-documented and useful package! Any ideas on integration with stplanr - lots of talk there about routing, sf and all sorts at the moment.

Great work @ultinomics.

sckott commented 7 years ago

@ultinomics No more things, although do make sure to add more examples - perhaps open an issue https://github.com/ropenscilabs/gtfsr/issues to remind yourself to do that,

approved

dantonnoriega commented 7 years ago

Hey @sckott, I was wondering what the next steps are. The on boarding process says "Once your package is approved, we will provide further instructions on transferring your repository to the rOpenSci repository" but no sure if I ever received those instructions. If I did, my apologies!

sckott commented 7 years ago

Hi @ultinomics -

dantonnoriega commented 7 years ago

Looping in @eamcvey.

Transfer away. I can make the edits pretty quickly to redirect. Also, happy to do a blog post but likely will do it in March, if thats ok.

Thanks, @sckott!

sckott commented 7 years ago

sure, no rush, i think we have a backlog anyway for guest blog posts

sckott commented 7 years ago

Okay, you're transferred https://github.com/ropensci/gtfsr