ropensci / software-review

rOpenSci Software Peer Review.
292 stars 104 forks source link

bikedata #116

Closed mpadge closed 7 years ago

mpadge commented 7 years ago

Summary

Download and aggregate data from all public hire bicycle systems which provide open data, currently including Santander Cycles in London, U.K., and from the U.S.A., citibike in New York City NY, Divvy in Chicago IL, Capital Bikeshare in Washington DC, Hubway in Boston MA, and Metro in Los Angeles LA.

Package: bikedata
Title: Download and Aggregate Data from Public Hire Bicycle Systems
Version: 0.0.1
Authors@R: c(
    person("Mark", "Padgham", email = "mark.padgham@email.com", role = c("aut", "cre")),
    person("Richard", "Ellison", role = "aut"),
    person(family = "SQLite Consortium", role = "ctb", 
        comment = "Authors of included SQLite code"))
Description: Download and aggregate data from all public hire bicycle systems
    which provide open data, currently including Santander Cycles in London,
    U.K., and from the U.S.A., citibike in New York City NY, Divvy in Chicago
    IL, Capital Bikeshare in Washington DC, Hubway in Boston MA, and Metro in
    Los Angeles LA.
License: GPL-3
Depends:
    R (>= 3.0)
Imports:
    dplyr,
    httr,
    lubridate,
    methods,
    Rcpp,
    RSQLite,
    reshape2,
    tibble,
    xml2
Suggests:
    knitr,
    rmarkdown,
    roxygen2,
    testthat,
    covr
LinkingTo: 
    BH,
    Rcpp
VignetteBuilder: knitr
SystemRequirements: C++11
NeedsCompilation: yes
URL: https://github.com/mpadge/bikedata
BugReports: https://github.com/mpadge/bikedata/issues
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1
Remotes: jimhester/covr

https://github.com/mpadge/bikedata

Data retrieval

All those interested in analysing urban mobility - urban and general transport modellers and planners, urban scientists. Anyone wanting to make cool data visualisations of movement through some of the euro-american world's biggest cities (like this prominent example).

No current R packages enable importing of public hire bicycle data. @ramnathv has his visualising bike share repo, but that's not an R package. I've heard rumours of a package in development for live station feeds along these lines, but even if and when that appears, it would focus on real-time data (primarily bicycle availability), whereas bikedata focusses exclusively on archiving and aggregating historical data.

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Package will be archived as soon as the ropensci review process has been completed, in accordance with JOSS policies.

Detail

@ramnathv because of his prominent prior work on analysing public hire bike systems


Note that goodpractice flags only:

  1. Test coverage, which will be extended; and
  2. Large installed size, for explanation of which see cran-comments.md
maelle commented 7 years ago

@mpadge short question even before the real editor checks, why do you use the development version of covr?

mpadge commented 7 years ago

That's not part of the CRAN submission (obviously), and is used because of Jim Hester's help here - see his extended commit message at top. It's temporary and will be resolved once covr is updated on CRAN.

maelle commented 7 years ago

Nice, thanks!

maelle commented 7 years ago

Thanks for your submission @mpadge! I'm currently looking for reviewers.

Editor checks:


Editor comments


Reviewers: @eamcvey @chucheria Due date: 2017-06-25

mpadge commented 7 years ago

Thanks @maelle, the warnings are indeed my fault - sorry about that, and hopefully fixed already. I've no idea why sqlite3/sqlite3.o would not be recognised because your machine built it in the first place. First let's clarify that the warnings are gone, and hope that that fixes the .o issue...

maelle commented 7 years ago

Thanks @eamcvey @chucheria for accepting to review this package! 😺 As a reminder here are the review template and the reviewing guidelines. Your reviews are due on the 2017-06-25.

@mpadge can you confirm that/when you've changed the cleanup process? (related to what @jeroen said "The cleanup script should usually only clean files generated by configure. Files generated by make should be cleaned via a clean section in Makevars" / or use cleanup.win as mentioned by Jim Hester)

mpadge commented 7 years ago

Thanks @maelle, and thanks in advance @eamcvey and @chucheria for agreeing to review - exciting to hear what you might think of it! I've done the Jim Hester cleanup.win option. Jeroen is nevertheless correct that cleanup should only relate to configure-generated files, and I will definitely transition to a single Makevars soon. The whole build is nevertheless clean from here on for all OS's.

maelle commented 7 years ago

@chucheria @eamcvey friendly reminder that your reviews are due in 10 days, on the 2017-06-25. :wink:

chucheria commented 7 years ago

Package Review

Documentation

The package includes all the following forms of documentation:

Functionality

Estimated hours spent reviewing:

4h


Review Comments

It is a great package overall, it removes the necessity of have knowledge of many bike data APIs/file system of open data, which is a great time and effort saver. It will make more sense as more bike open data is available inside the package.

Vignette has example uses and is very complete.

I played with the functions a bit, it is a simple package and it allows you to use API information locally.

General code comments

I found some errors executing the code.

I downloaded NY data from 2016-01 to 2016-05 once I changed to Los Angeles city, it told me that all data already exist.

dl_bikedata (city = 'New York City USA', dates = 201601:201605)
    Downloading 201605-citibike-tripdata.zip
    Downloading JC-201601-citibike-tripdata.csv.zip
    Downloading JC-201602-citibike-tripdata.csv.zip
    Downloading JC-201603-citibike-tripdata.csv.zip
    Downloading JC-201604-citibike-tripdata.csv.zip
    Downloading JC-201605-citibike-tripdata.csv.zip

dl_bikedata(city = 'la', dates = 201601:201605)
    All data files already exist

Once I downloaded NY data, I wanted to store it with store_bikedata:

ny <- store_bikedata (city = 'nyc', bikedb = 'bikedb', dates = 201601:201605)
    Downloading data for nyc
    All data files already exist
    Adding data to sqlite3 database
    Unzipping raw data files ...
     Show Traceback

     Rerun with Debug
     Error in unzip(f, list = TRUE) :
      zip file '/var/folders/zc/yccgnjm528j4x_665t_rz0km0000gn/T//RtmpU6AJg6/201603-citibike-tripdata.zip' cannot be opened

If I do the store_bikedata from the start everything runs great.

Installation:

macOS Sierra:

Installation in macOS was smooth. A warning popped up in installation and in every test: Warning: Installed Rcpp (0.12.11) different from Rcpp used to build dplyr (0.12.10). Please reinstall dplyr to avoid random crashes or undefined behavior. I reinstalled dplyr and the warning dissapeared.

Documentation:

Scripts

Scripts are named with a hyphen instead of snake_case, something that rOpenSci recommends and packages seem to follow that pattern. Functions and variables are well named and explained.

chucheria commented 7 years ago

Please, tell me if I didn't explained well any of the comments or errors.

maelle commented 7 years ago

Many thanks for your review @chucheria ! :smile_cat:

sckott commented 7 years ago

@eamcvey your review was due a few weeks back, will you be able to get it done soon?

mpadge commented 7 years ago

A general FYI: I had to upload a new version to CRAN because changes in the LA data were causing failures. I also addressed the first issue that @chucheria encountered so the package now gives the more informative message:

> dl_bikedata(city = 'la', dates = 201601:201605)
There are no la files for those dates

I was unable to repeat the second error, but note that tempdir() seems to have a double forward slash - .../T//RtmpU6AJg6/ instead of presumably .../T/RtmpU6AJg6/. If that were the problem, I would have no idea of either ultimate cause or solution?

maelle commented 7 years ago

thanks for the update @mpadge !

eamcvey commented 7 years ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Paper (for packages co-submitting to JOSS)

The package contains a paper.md with:

  • [x] A short summary describing the high-level functionality of the software
  • [x] Authors: A list of authors with their affiliations
  • [x] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [x] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 5


Review Comments

This is a nice package that made it easy to access the data as promised. The linked article showing analysis of this type of data is helpful to make clear what can be done with it. I was able to work through the vignette successfully and look at some additional things on my own.

Some specific issues I noticed:

maelle commented 7 years ago

Thanks a lot for your review @eamcvey ! :smile_cat:

mpadge commented 7 years ago

Thanks both @chucheria and @eamcvey for your reviews! :smiling_imp: I should have addressed all issues raised sometime next week, and will respond accordingly.

mpadge commented 7 years ago

Package changes in response to reviews

  1. As already mentioned, the previously mis-leading message that "All data files already exist" has been replaced with "There are no [city] files for that date."
  2. Repo now includes both CONTRIBUTING.md and CONDUCT.md - thanks to both reviewers for pointing out this absence on my part.
  3. Both reviewers noted dplyr (or dbplyr) issues. These arose in response to the upgrade to dplyr_0.7, which led to interesting situations for many package developers. bikedata has resolved all such issues by dropping the dplyr dependency.
  4. Extensive example now provided for bike_db_totals() - thanks! An example was previously included in bike_demographic_data(), but just one single line that was presunably easy to overlook. The reviewer's comment nevertheless inspired me to provide a much more extensive example, during which I also relealised that there were some formatting issues for demographic data from some cities as stored in the database. These have all been rectified.

Changes not yet implemented

~~1. I was unable to reproduce the problem encountered by @chucheria, but have already noted that I have previously encountered strange behaviour when running store_bikedata() multiple times. I suspect that the main cause of such problems is index creation, which may only be performed once for a database, and any attempt to create indices that already exist will throw an error. Current bikedata issue#38 should largely resolve these issues.~~ - now done!

Responses to other comments not related to package changes

  1. There was always NEWS.md file which I presume the first reviewer simply overlooked.
  2. The package uses GPL-3 which is a copyright license and therefore requires no LICENSE file. (LICENSE files are also rejected by CRAN for GPL-3 packages: they only apply to non-copyright licenses such as MIT).
  3. Yeah, i know file names are hyphenated rather than underscored, but this is done because there are systems which always interpret underscores as non-alpha-numeric characters and therefore produce inconsistent highlighting of function names; underscores avoid this problem. (That comment only applies to those who use vim, emacs, or other non-RStudio interfaces.)
  4. Yes, I do use function (x) rather than function(x), but now explain this in CONTRIBUTING.md: "Words of text are separated by whitespace, and so code words should be too." Let's just say I'm pre-empting physchological analyses of R-coding style akin to what has been done for R colour palettes. Spaces will one day reign ...
  5. Regarding authorship, the ropensci authorship guide merely describes the requirements for DESCRIPTION, to which this package accords, so I think current status is okay in that regard. I'll happily modify if required or desired.
  6. Finally, note that vignette() will not automatically load the vignette when the package is psuedo-loaded with devtools::load_all(). I nevertheless checked with a "proper" locall load (R CMD install <tarball>) and was then able to open vignette() as expected.

Thanks again @eamcvey and @chucheria for helpful insight into improving the package!

maelle commented 7 years ago

Awesome work @mpadge! When do you think you'll be able to solve the issue mentioned in "Changes not yet implemented"?

I agree reg. the authors list.

@eamcvey and @chucheria are you happy with the changes?

mpadge commented 7 years ago

Thanks @maelle! That change is issue#38, which I will definitely resolve within the coming week. I'll let you know as soon as it's done. :bike: :biking_man: :biking_woman: :bike:

maelle commented 7 years ago

@mpadge For info you can already add the rOpenSci badge to the README of bikedata, for now it'll show "under review2" and after approval "peer-reviewed".

[![](http://badges.ropensci.org/116_status.svg)](https://github.com/ropensci/onboarding/issues/116)

mpadge commented 7 years ago

@maelle done!

chucheria commented 7 years ago

Thanks for your comments @mpadge. I'm sorry I overlooked the NEWS file! So embarrasing... I didn't know you don't need a license, anyway if CRAN doesn't admit it better this way, I guess I'm used to do the LICENSE and gitignore first thing.

About the error you commented you couldn't reproduce, I'll try to do the same thing as soon as I have some time in case it appears again an open an issue with more info in case it appears, does this sound good?

Thank you both @mpadge and @maelle! This has been fun!

mpadge commented 7 years ago

@maelle issue#38 now done. This should resolve previous errors. Note that these are system dependent because of the different ways the bundled SQLite3 code is compiled, so it is unfortunately not possible to systematically reproduce the errors. Nevertheless, I suspect most of them arose through somehow trying to create indexes on a database that already had indexes in it. The latest modifications will robustly prevent this ever happening, and so I am pretty confident will resolve these kinds of errors. I have taken the liberty of striking through the relevant bit of my previously responses ("Changes not yet implemented")

maelle commented 7 years ago

@mpadge ok, thanks a lot! The package looks good to me but I'll wait for @chucheria to have had time to look at the error again before approving. :-)

mpadge commented 7 years ago

absolutely - it's really important to ensure it really has been solved!

chucheria commented 7 years ago

No error, the package seems to be solid. Thanks for the patience!

maelle commented 7 years ago

@mpadge sorry for the delay! The package is now approved, great work, and thanks @eamcvey + @chucheria for your reviews!

To-dos:

Welcome aboard! We'd also love a blog post about your package, either a short-form intro to it (https://ropensci.org/tech-notes/) or long-form post with more narrative about its development. ((https://ropensci.org/blog/). Let me know if you are interested.

mpadge commented 7 years ago

Thanks @maelle for all of the help and guidance getting the package through to this exciting last stage! You list of TODOs:

Finally, re: blog post: absolutely interested! Yes please!

maelle commented 7 years ago

You mean you didn't have to submit it? :thinking: Having @karthik as an editor of the R package surely helped. :wink:

In my experience you need to submit it, choose Karthik as JOSS editor.

Reg. the blog post let me tag @stefaniebutland who'll get in touch with you.

mpadge commented 7 years ago

okay, i've submitted it now, and put Karthik as the editor:

ping @richardellison, so you know where we're at here. new publication pending!

maelle commented 7 years ago

I think we can close this now (I told Karthik about the JOSS submission, I check there regularly, and the blog post discussion doesn't need this to be open). Your review badge will get green! 😁

mpadge commented 7 years ago

YIPPEE!!! 🚲 🚲 🚲 I didn't realise until I just discovered the ropensci issue on this that i should have but didn't actually request the explicit approval of @chucheria and @eamcvey for me to insert those rev roles in the DESCRIPTION. May I kindly request retrospective approval now? Please!

maelle commented 7 years ago

Oops and I forgot to remind it to you.

chucheria commented 7 years ago

Thanks for adding me! You have my very happy approval @mpadge !

stefaniebutland commented 7 years ago

@mpadge (I'm responding later than I had hoped) Great to hear you're interested in contributing a blog post about this package. I hope it gets more eyes on it and generates some enthusiasm and input.

For rOpenSci blog post you could do either a short post for the Developer Blog that has a technical focus, or do a post for the main blog whose audience is broader.

Main blog post would include more narrative about your motivation for creating the package, unmet need, how-to use, good to end with a thank you to package reviewers with links to their GitHub or Twitter, point readers to issues and what you think is next to improve the package and invite people to open or address an issue etc.

Deadlines:

Practical instructions:

Which type of post are you thinking of?

I had been thinking of inviting you to contribute a post about osmdata, so if you're interested you could do both as developer blog style, or one dev and one main.

Did I miss anything?

mpadge commented 7 years ago

thanks @stefaniebutland, and no worries about delayed response. I'm absolutely keen to get these packages known in any and all ways possible. Absolutely happy for at least one to be a main blog entry, for which my preference would be bikedata, with osmdata then shucked off to a dev blog post. If that's okay with you, i'll happily get cracking on a PR for main blog post on bikedata