Closed tbdv closed 6 years ago
devil is in the "user_type" column starting in april, perhaps?
Thanks for that, and yeah, you're absolutely right. The problem arises because the following column ("member_birth_year"
) is quoted when it's empty (so just ""
), but unquoted when not (so 1974
, not "1974"
). Fix on it's way ...
That commit simply forces the quotation structure of each file to be re-defined for every single line. This results in less efficient reading for SF. My timings show an increase in one sample from 3.8 to 4.5 seconds, so a bit under 20% increase in reading time. But we're still only talking a handful of seconds for all of SF, so that shouldn't be considered relevant.
(It would of course be possible to write a custom function to avoid this, but that's precisely what the old (pre v0.2) version did and it was very difficult to keep track of all the custom routines for each city. The whole point of the latest version is to avoid the need for tailored routines for each data quirk.)
Thanks @tbdv for finding this bug!
rough note to self if nothing else.
on bikedata from cran i get:
following:
inspecting the headers, looks like it might be because some April and May CSV's start in the first column with a string where previously they were ints.
otoh same import fine at this commit
March 2018 Data
April 2018 Data
May 2018 Data