Closed dantonnoriega closed 7 years ago
Passes all editors' checks, is a good fit
Seeking reviewers...
Reviewers: @Robinlovelace Due date: 2016-07-18
@Robinlovelace Due date: 2016-07-18 - hey there, it's been 37 days, please get your review in soon, thanks :smiley_cat: (ropensci-bot)
Thanks for the reminder and opportunity - here is my review of the package:
GTFS is becoming the worldwide standard for sharing information about public transport systems. This makes it hugely important for the transport industry, public sector organisations trying to direct such systems in socially beneficial directions, and for researchers who have a range of motivations including the need to transition away from fossil fuels to save the world.
In any case, GTFS data is largely out of reach for the humble citizen like you and me. Googling "software gtfs data" reveals a cottage industry of GTFS editors and viewers, but little in the way of open source software for accessing this vital new file format. A GIS StackExchange question further shows the lack of cohesion around a single open source product (the World Bank provides a more up-to-date list of tools that can interact with GTFS data).
R is a powerful language that provides an emerging set of transport tools via packages such as stplanr, so rOpenSci is the ideal place for the development of a new package to make GTFS available to the world, in the spirit of citizen science.
The package is installed and loaded easily with:
devtools::install_github('ropenscilabs/gtfsr')
library(gtfsr)
The most common thing that someone will want to use the package for, I imagine, is read in a GTFS file. This can be done easily and, based on a few tests of feeds from transit.land, reliably:
# this one failed for some reason
u = "https://www.bjcta.org/wp-content/uploads/2016/05/BJCTA_GTFS_0516.zip"
# the example in the README works
u = "http://www.co.fairbanks.ak.us/transportation/MACSDocuments/GTFS.zip"
# as does this giant one from Rio de Janeiro
u = "http://dadosabertos2.rio.rj.gov.br/dadoaberto/google-transit/google_transit.zip"
# as does this one from the USA
u = "http://www.bart.gov/dev/schedules/google_transit.zip"
gtfs_obj = import_gtfs(u)
The next logical thing you would want to do is plot the result. (Suggestion: make simply loading and plotting GTFS data appear earlier and more prominently in the README.)
This can be done easily with the sensible but clunkily-named map_gtfs_agency_routes()
. Suggestion: rename this verbosely name function or create a new generic plotting function called simply map_gtfs()
:
map_gtfs_agency_routes(gtfs_obj)
map_gtfs_agency_routes(gtfs_obj, include_stops = TRUE)
I think in most cases users will want to see the stops. Suggestion: make include_stops = TRUE
the default option in this function and the yet-to-be created generic map_gtfs()
.
A useful feature of the package, that goes above and beyond other GTFS software, is its ability to search for public GTFS feeds, using the transitfeeds.com API key. The guidance on how to get a key in the README is good. However, advice for storing the key could be better: the set_api_key()
key works fine for one session but it seems the key must be entered afresh every for every new R session. Suggestion: recommend setting the key in .Renviron
, as documented in the httr vignette, e.g. with the line GTFS_API_KEY=XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
. Then it could be automatically retrieved with each new session by adding something like the following lines to get_api_key()
:
if(grepl("[[:alnum:]]{8}\\-[[:alnum:]]{4}\\-[[:alnum:]]{4}\\-[[:alnum:]]{4}\\-[[:alnum:]]{12}", Sys.getenv("GTFS_API_KEY")))
gtfs_api_key$set(Sys.getenv("GTFS_API_KEY"))
We use a similar technique in stplanr with cyclestreet_pat
, which we should probably generalise to other API keys...
In any case, the most important thing is that users can quickly view valid gtfs feeds:
feedlist_df = filter_feedlist(get_feedlist()) # errors:
# Error in function_list[[k]](value) : could not find function "mutate"
Note that the above code errors, probably because the function mutate
is missing a dplyr::
prefix in the definition of get_feedlist. (Suggestion: Fix this issue).
After dplyr has been loaded, the function works:
library(dplyr)
feedlist_df = filter_feedlist(get_feedlist())
As per the documentation, you can then find feeds from a specific country, e.g. Spain:
u_spain = filter(feedlist_df, grepl("Spain", loc_t)) %>%
select(url_d)
# get all gtfs feeds from spain
gtfs_spain = import_gtfs(u_spain)
Of the 3 imported GTFS feeds, only 1 can be plotted:
map_gtfs_agency_routes(gtfs_spain[[1]]) # fails - not a gtfs object
map_gtfs_agency_routes(gtfs_spain[[2]]) # works
map_gtfs_agency_routes(gtfs_spain[[3]]) # fails - no route_id
I think that the fact that only 1 of these Spanish GTFS datasets works says more about the variable quality of GTFS data than the package. However, such testing on a wide range of international examples could help further refine the package and make it more accepting of diverse feed formats. Suggestion: test gtfsr on a wide range of datasets, perhaps using continuous integration, although not sure how this would work with its reliance on external datasets - any ideas?
Data consistency issues aside (which are clearly not the fault of gtfsr), the package clearly does a great job of making GTFS data more accessible to the masses.
To increase the utility of GTFS objects, I think it would be fantastic if the package included functions to export them as spatial objects. That would expose them to the power of R's impressive GIS capabilities. Suggestion: explore options for converting gtfs
data classes into spatial objects such as SpatialLinesDataFrame
.
Note: stplanr has a basic function for this: gtfs2sldf().
The package's vignette is currently the same as the README. This is useful for now but in the long-run the vignettes should be self-standing. Suggestion: make the README shorter and self-standing, make the vignette longer with more links to existing software and documentation for understanding and working with GTFS data.
The first thing I noticed when cloning this repo was its massive size (44 MB). This is predominently due to previously deleted files in the .git repo. It won't cause issues when installing it but will be a deterrent to potential developers. Suggestion: removing the culprits, e.g. using the BFG repo cleaner.
An additional comment, more one for the longer term, is how can this interact with routing? Are there plans in future versions of gtfsr to allow journey planning, e.g. via an interface to something like OpenTripPlanner? This could help the GTFS data be used for analysis of spatio-temporal accessibility.
Thanks for the review @Robinlovelace !
@ultinomics review is in, let us know if you need any help
Will do! I plan to go through the edits during the first week of September.
On Aug 17, 2016 3:02 PM, "Scott Chamberlain" notifications@github.com wrote:
@ultinomics https://github.com/ultinomics review is in, let us know if you need any help
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/55#issuecomment-240513439, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaR2d-SgPKf5R2alC5VJkhoPrh4KwSaks5qg1rBgaJpZM4I-MZq .
okay
Update: I will be working on this during the last week of September during TransLoc internal "dev days". Thanks everyone and sorry for dragging my feet!
thanks for the update
I've gone through most of the review (thanks, @Robinlovelace!) but still need to update the README and vignette and learn how to save the API key to .Renviron. I've also gone through most all the issues that I considered bugs. You can see what I have to work through still on the issues page.
Otherwise, the most pressing suggestions have been implemented and are currently on a branch called map-update
! Once I update the README and vignette, I will merge and push to master
. I'll work on the API storing thing separately since it is not critical to getting the package to work.
Let me know what's next! I really pumped about the new, updated mapping feature but would be great to get folks testing it!
Great work - not had a chance to test it but looking forward to that.
@ultinomics thanks for the changes. I'll try it out the new map stuff.
Let me know what's next!
In addition to readme and vignette changes ... The API key change thing is important to get done as it's a security issue. Users of this pkg will possibly unintentionally leave their API key in an R script and then put their code up on the web (e.g., on github) for all the world to see. If they store as an env var, that won't happen
I'll get in it! Good point!
On Oct 6, 2016 3:32 PM, "Scott Chamberlain" notifications@github.com wrote:
@ultinomics https://github.com/ultinomics thanks for the changes. I'll try it out the new map stuff.
Let me know what's next!
In addition to readme and vignette changes ... The API key change thing is important to get done as it's a security issue. Users of this pkg will possibly unintentionally leave their API key in an R script and then put their code up on the web (e.g., on github) for all the world to see. If they store as an env var, that won't happen
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/55#issuecomment-252065293, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaR2UH8-kk4El-g3a_wiBjYHyikQuO8ks5qxUzggaJpZM4I-MZq .
@ultinomics i was going to test out but the map-update
branch isn't there, was it merged to master?
@sckott it was! did a few edits the day after but forgot to update ya.
Noticed a few more things:
?gtfsr
and ?gtfsr-package
they get a high level manual file that explains the package, etc.mutate()
is called in get_feedlist()
but mutate
is not imported or namespaced, looks like must dplyr fxns you namespaced, and may just have missed a fewno visible binding for global variable
, which seem to relate to lazy eval usage with dplyr/magrittr. You can do the globalVariables
trick like https://github.com/ropensci/rnoaa/blob/master/R/globals.R or perhaps use dplyr
's underscore fxn versions and quote variable names and such. Thoughts on what we recommend as best practice here @noamross @ultinomics any update on your work on this package?
It is on my to do list for after Dec 2nd. Unfortunately, the last few weeks have been consumed by PhD work.
I’m reserving the package as my “fun thing to do over winter break”. I plan on tackling the API key problem next.
On Nov 26, 2016, at 6:28 PM, Scott Chamberlain notifications@github.com wrote:
@ultinomics https://github.com/ultinomics any update on your work on this package?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/55#issuecomment-263092244, or mute the thread https://github.com/notifications/unsubscribe-auth/AFaR2Rni4pkpzB9MQ445biSlfEj3IFA6ks5rCMCzgaJpZM4I-MZq.
@sckott My bad on close/comment.
Alright! I added the API .Renviron
functionality and the ?gtfsr
manual file and the warnings thing.
Anything else I can do? I know I need a lot more examples still but I hope it's good enough.
no worries
@Robinlovelace any other thoughts at this point?
I'm having a last look now
No more thoughts from me Scott (although I must confess that this is partly due to lack of time) - it looks like a well-documented and useful package! Any ideas on integration with stplanr - lots of talk there about routing, sf and all sorts at the moment.
Great work @ultinomics.
@ultinomics No more things, although do make sure to add more examples - perhaps open an issue https://github.com/ropenscilabs/gtfsr/issues to remind yourself to do that,
approved
Hey @sckott, I was wondering what the next steps are. The on boarding process says "Once your package is approved, we will provide further instructions on transferring your repository to the rOpenSci repository" but no sure if I ever received those instructions. If I did, my apologies!
Hi @ultinomics -
ropenscilabs/gtfsr
, they'll get redirected automatically to ropensci/gtfsr
) - Okay for me to transfer to ropensci now?Looping in @eamcvey.
Transfer away. I can make the edits pretty quickly to redirect. Also, happy to do a blog post but likely will do it in March, if thats ok.
Thanks, @sckott!
sure, no rush, i think we have a backlog anyway for guest blog posts
Okay, you're transferred https://github.com/ropensci/gtfsr
What does this package do? (explain in 50 words or less)
Currently, the package facilitates the import of GTFS data (from url or local path) to create
gtfs
objects, validates file structure ofgtfs
objects, and provides functions to easily plot stops, routes, or networks.Future versions will build validation and modeling capabilities, and options to layer other data.
URL for the package (the development repository, not a stylized html page)
https://github.com/ropenscilabs/gtfsr
What data source(s) does it work with (if applicable)?
It is designed to work with any text files following the GTFS format. It does not come with any accompanying data.
Who is the target audience?
Transportation Economists, Transportation Consultants and Researchers, Urban Planners, Open Data Enthusiasts
Are there other R packages that accomplish the same thing? If so, what is different about yours?
None
R CMD check
(ordevtools::check()
) produce any errors or warnings? If so paste them below.