ropensci / bikedata

:bike: Extract data from public hire bicycle systems
https://docs.ropensci.org/bikedata
81 stars 16 forks source link

London dates not parsed correctly in bike_daily_trips() #85

Closed ghost closed 6 years ago

ghost commented 6 years ago

It seems that London files in the csv or Excel sheets are dated Day / Month / Year, instead of Month / Day / Year as it's seen in other cities. It seems that bike_daily_trips() expects the dates to be the latter, resulting in erroneous dates parsing, as well as a huge amount of NAs. I haven't checked other functions, but I expect the issue to be present throughout.

In case it helps, I've run the following after downloading the files I needed to bring in the CSV files, change their date format, and save back to disk:

library(tidyverse)
files_lo <- list.files(path = file.path(getwd(), "bikedata/London"), pattern = ".csv")

all_lo_data <- files_lo %>% 
    map(function(x) {
        read_csv(paste0(file.path(getwd(), "bikedata/London"), "/", x)) %>% 
            mutate_at(vars(c(`Start Date`, `End Date`)), funs(lubridate::dmy_hm(.) %>%
                                        format("%m/%d/%Y %H:%M:%S")))
    })

walk2(all_lo_data, files_lo, ~ write_csv(.x, path = paste0(file.path(getwd(), "bikedata/London"), "/", .y)))
mpadge commented 6 years ago

Thanks for that - the whole package recently got restructured to do more intelligent auto-parsing of dates, but I'm now reliant on people directly finding these kinds of inconsistencies. It should be relatively straightforward to fix, so I'll get on to it asap.

mpadge commented 6 years ago

Actually seems okay. Can you make sure you have the latest version:

packageVersion("bikedata")
# [1] ‘0.2.0.100’

All dates for all systems should be DD/MM/YY or DD/MM/YYYY, or else other idosyncratic forms, but there are no systems which use MM/DD/YY(YY). Feel free to re-open is error recurs, but if so, could you please indicate which date ranges cause the errors (because London has thousands of files by now).

mpadge commented 6 years ago

oh sorry, just noticed you got the error from bike_daily_trips(), which I can indeed reproduce. And the pattern changes (in SQL date form) from 2016-31-12 (YYYY-DD-MM) to 2017-01-01 (YYYY-MM-DD).

mpadge commented 6 years ago

Sorry, misunderstood the problem myself there for a while. You were right: the dates for London weren't being processed properly at all. That PR should now fix it, and bike_daily_trips() should be free of NAs. Thanks for digging this up and helping fix this important :bug: !