ropensci / bikedata

:bike: Extract data from public hire bicycle systems
https://docs.ropensci.org/bikedata
81 stars 16 forks source link

Cannot reproduce NY example #77

Closed Robinlovelace closed 6 years ago

Robinlovelace commented 6 years ago

Not sure if I'm missing something, but I'm trying to reproduce this example provided by @richardellison and it's failing on my computer, as follows:

library(bikedata)
#> Data for London, U.K. powered by TfL Open Data:
#>   Contains OS data Ⓒ Crown copyright and database rights 2016
#> Data for New York City provided and owned by:
#>   NYC Bike Share, LLC and Jersey City Bike Share, LLC ("Bikeshare")
#>   see https://www.citibikenyc.com/data-sharing-policy
#> Data for Washington DC (Captialbikeshare), Chiago (Divvybikes) and Boston (Hubway)
#>   provided and owned by Motivate International Inc.
#>   see https://www.capitalbikeshare.com/data-license-agreement
#>   and https://www.divvybikes.com/data-license-agreement
#>   and https://www.thehubway.com/data-license-agreement
#> Nice Ride Minnesota license  https://www.niceridemn.org/data_license
dl_bikedata(city = "ny", data_dir = "bikedata-ny", dates = "201610")
#> Downloading 201610-citibike-tripdata.zip
#> Error in curl::curl_fetch_disk(url, x$path, handle = handle): Failed to open file bikedata-ny/201610-citibike-tripdata.zip.
store_bikedata(city = "ny", bikedb = "bikedb", quiet = FALSE, dates = "201610")
#> Checking data for ny
#> Creating sqlite3 database
#> Unzipping raw data files ...
#> All data already in database; no new data added
#> [1] 0
stns <- bike_stations("bikedb", city = "ny")
ntrips <- bike_tripmat("bikedb", city = "ny", long = TRUE)
#> Error in check_city_arg(bikedb, city): city ny not represented in database
Session info ``` r devtools::session_info() #> Session info ------------------------------------------------------------- #> setting value #> version R version 3.4.4 (2018-03-15) #> system x86_64, linux-gnu #> ui X11 #> language en_GB:en #> collate en_GB.UTF-8 #> tz Europe/London #> date 2018-05-26 #> Packages ----------------------------------------------------------------- #> package * version date source #> backports 1.1.2 2017-12-13 CRAN (R 3.4.3) #> base * 3.4.4 2018-03-16 local #> bikedata * 0.2.0 2018-04-27 CRAN (R 3.4.4) #> bit 1.1-12 2014-04-09 CRAN (R 3.4.1) #> bit64 0.9-7 2017-05-08 CRAN (R 3.4.1) #> blob 1.1.1 2018-03-25 CRAN (R 3.4.4) #> cellranger 1.1.0 2016-07-27 cran (@1.1.0) #> compiler 3.4.4 2018-03-16 local #> curl 3.2 2018-03-28 CRAN (R 3.4.4) #> datasets * 3.4.4 2018-03-16 local #> DBI 1.0.0 2018-05-02 cran (@1.0.0) #> devtools 1.13.5 2018-02-18 CRAN (R 3.4.4) #> digest 0.6.15 2018-01-28 CRAN (R 3.4.3) #> dodgr 0.1.0.099 2018-05-24 Github (ATFutures/dodgr@cee4016) #> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1) #> formatR 1.5 2017-04-25 CRAN (R 3.4.1) #> graphics * 3.4.4 2018-03-16 local #> grDevices * 3.4.4 2018-03-16 local #> grid 3.4.4 2018-03-16 local #> htmltools 0.3.6 2017-04-28 cran (@0.3.6) #> httr 1.3.1 2017-08-20 cran (@1.3.1) #> igraph 1.2.1 2018-03-10 cran (@1.2.1) #> jsonlite 1.5 2017-06-01 cran (@1.5) #> knitr 1.20 2018-02-20 cran (@1.20) #> lattice 0.20-35 2017-03-25 CRAN (R 3.3.3) #> lubridate 1.7.4 2018-04-11 cran (@1.7.4) #> magrittr 1.5 2014-11-22 CRAN (R 3.3.2) #> memoise 1.1.0 2017-04-21 CRAN (R 3.4.1) #> methods * 3.4.4 2018-03-16 local #> osmdata 0.0.7.001 2018-05-24 Github (ropensci/osmdata@0545bcc) #> pillar 1.2.2 2018-04-26 CRAN (R 3.4.4) #> pkgconfig 2.0.1 2017-03-21 cran (@2.0.1) #> plyr 1.8.4 2016-06-08 CRAN (R 3.3.2) #> R6 2.2.2 2017-06-17 cran (@2.2.2) #> rbenchmark 1.0.0 2012-08-30 CRAN (R 3.4.3) #> Rcpp 0.12.17 2018-05-18 cran (@0.12.17) #> RcppParallel 4.4.0 2018-03-02 CRAN (R 3.4.4) #> readxl 1.1.0 2018-04-20 CRAN (R 3.4.4) #> reshape2 1.4.3 2017-12-11 CRAN (R 3.4.3) #> rlang 0.2.0.9001 2018-05-26 Github (r-lib/rlang@4d06438) #> rmarkdown 1.9 2018-03-01 CRAN (R 3.4.4) #> rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3) #> RSQLite 2.1.1 2018-05-06 CRAN (R 3.4.4) #> rvest 0.3.2.9000 2018-04-12 Github (hadley/rvest@9a51a5d) #> sp 1.2-7 2018-01-19 cran (@1.2-7) #> stats * 3.4.4 2018-03-16 local #> stringi 1.2.2 2018-05-02 CRAN (R 3.4.4) #> stringr 1.3.1 2018-05-10 CRAN (R 3.4.4) #> tibble 1.4.2 2018-01-22 cran (@1.4.2) #> tools 3.4.4 2018-03-16 local #> utils * 3.4.4 2018-03-16 local #> withr 2.1.2 2018-05-26 Github (jimhester/withr@70d6321) #> xml2 1.2.0 2018-01-24 CRAN (R 3.4.3) #> yaml 2.1.19 2018-05-01 CRAN (R 3.4.4) ```
mpadge commented 6 years ago

Ah, S3 glitches. Servers sometimes list files that are actuallyNULL and so not downloadable. There's a catch for these, but it's not implemented across the board. I'll let you know asap when I've fixed it.

mpadge commented 6 years ago

The download glitch occurs because the directory must first be created.

dir.create ("bikedata-ny") # will create at "."
dl_bikedata(city = "ny", data_dir = "bikedata-ny", dates = "201610") # then works

I'll modify to check for directory existence and offer a should this directory be created? option, which should help clarify your totally confusing error messages.

Next glitch simply occurs because you've not told store_bikedata where the files are. This code should work:

store_bikedata(bikedb = "bikedb", data_dir = "bikedata-ny")

Note that dates are redundant there, because reading defaults to all files in the directory, and that city is also redundant because each city has its own file nomenclature so city is automatically determined. Note also that bikedb specifies the name of a file and so, without any additional directory information, defaults to tempdir(). The data_dir is, in contrast, a directory specification and so defaults to tempdir() only if not otherwise specified, with the single specification defaulting to "./bikedata-ny".

Other options can be confirmed with this

bike_rm_db("bikedb")
store_bikedata(bikedb = "bikedb", data_dir = "bikedata-ny", city = "ny", dates = 201610, quiet = FALSE)

which will generate the same results. Both cases should give

Total trips read = 1,603,483
[1] 1603483

And then,

> stns <- bike_stations("bikedb", city = "ny")
> ntrips <- bike_tripmat("bikedb", city = "ny", long = TRUE)
> print(stns)
# A tibble: 656 x 6
      id city  stn_id name                        longitude latitude
   <int> <chr> <chr>  <chr>                           <dbl>    <dbl>
 1     1 ny    ny116  W 17 St & 8 Ave                  40.7    -74.0
 2     2 ny    ny119  Park Ave & St Edwards St         40.7    -74.0
 3     3 ny    ny120  Lexington Ave & Classon Ave      40.7    -74.0
 4     4 ny    ny127  Barrow St & Hudson St            40.7    -74.0
 5     5 ny    ny128  MacDougal St & Prince St         40.7    -74.0
 6     6 ny    ny137  E 56 St & Madison Ave            40.8    -74.0
 7     7 ny    ny143  Clinton St & Joralemon St        40.7    -74.0
 8     8 ny    ny144  Nassau St & Navy St              40.7    -74.0
 9     9 ny    ny146  Hudson St & Reade St             40.7    -74.0
10    10 ny    ny147  Greenwich St & Warren St         40.7    -74.0
# ... with 646 more rows
> print(ntrips)
# A tibble: 430,336 x 3
   start_station_id end_station_id numtrips
   <chr>            <chr>             <dbl>
 1 ny116            ny116                47
 2 ny116            ny119                 0
 3 ny116            ny120                 0
 4 ny116            ny127                32
 5 ny116            ny128                26
 6 ny116            ny137                 3
 7 ny116            ny143                 2
 8 ny116            ny144                 0
 9 ny116            ny146                 6
10 ny116            ny147                15
# ... with 430,326 more rows

I'll implement the warning about non-existent directories plus add ability to auto-add non-existent ones, then close.

mpadge commented 6 years ago
> dl_files <- dl_bikedata (city = "ny", data_dir = "bikedata-ny", dates = "201610")
directory bikedata-ny does not exist
Should it be created (y/n)?

"y" proceeds properly, and "n" just stops (with this commit).

Robinlovelace commented 6 years ago

Awesome. Great fix will test asap, likely weds when back in Leeds. Currently in Brighton. Thanks loads!