Closed mattdowle closed 4 years ago
Thanks for the heads-up Matt.
I had already picked up that something was wrong (tests started failing on my latest commits), but I had not yet managed to pinpoint the issue, so this saves me a lot of work.
I will update actel's functions to account for the new data.table functionalities and prepare a new version for CRAN.
New version is now passing github and travis tests, so I assume the issue is solved.
Great, thanks Hugo! If it passes both R CMD check actel_*.tar.gz
and TZ=UTC R CMD check actel_*.tar.gz
then yes it's solved. Alternatively, and probably better and easier (and what we do in data.table tests) is you can also use Sys.setenv(TZ="UTC")
in your tests and fread will read that latest value set in the R session. This avoids needing to run R CMD check
twice, with and without TZ=UTC. Use Sys.unsetenv("TZ")
in your tests to go back to default local time zone rather than Sys.setenv(TZ="")
because the latter works differently on Windows and Linux.
We'd like to change the default for fread's tz= from "" to "UTC". Because the current situation of a different type (POSIXct vs character) depending on the TZ environment variable is uncomfortable. We decided not to do that yet because it would break expectations compared to base R in this case of unmarked datetime. But in terms of revdep breakage, out of 874 packages using data.table directly only two would fail if we changed the default: actel and spatsoc. It's surprising it's so few, but it does make the prospect of a default change easier. And it looks like you're already made the necessary change in dev anyway.
Would you be ok if fread read unmarked datetime as UTC-POSIXct by default, instead of character as 1.13.0 is doing currently?
Hi Matt!
Thanks for the follow up, I will make sure to include that Sys.setenv line in the tests to make sure everything is working properly.
I agree with you that having a consistent method (i.e. always reading timestamps as POSIXct) would make more sense and I am quite happy to see that new tz argument in fread. actel's functions have a mandatory tz argument where the user states the timezone of the input data, so it would be very easy for me to plug that into fread's tz argument. The only reason why I didn't do it right away is because I am not sure what would happen if someone updates actel but does not update data.table (in which case fread would not be expecting a tz argument)?
To be on the safe side for now, I will be relying on the old method (i.e. read the stamps as character strings and convert them afterwards), mostly because the loadBio and loadDeployments functions are expected to be dealing with small sized tables (so computing time is not really a problem). The main reason why I switched from read.csv to fread here was because fread can handle csv files that have been "damaged" by Excel (thanks for that!).
There's one other instance where fread has to deal with timestamps in my package (function compileDetections), but here the input files are expected to be in UTC, which fits perfectly with the new version of data.table. It would even save me the trouble of using fastPOSIXct in my processXXXfile functions (e.g. here and here).
All in all, I would have no problem with fread reading datetime data as UTC by default, because I can take advantage of the new behaviour to suit my needs :)
Thanks a lot for coding the data.table package!
Many thanks again Hugo! Great. Interesting to hear the background and the the related info too.
I am not sure what would happen if someone updates actel but does not update data.table (in which case fread would not be expecting a tz argument)?
In DESCRIPTION you can put the version in the import line: "data.table (>= 1.13.0)" and that way the user will be prompted to upgrade data.table too.
Hi Hugo,
data.table v1.13.0 has just been published on CRAN. actel is impacted just when TZ=UTC, e.g. "TZ=UTC R CMD check actel_1.0.0.tar.gz". actel is still passing on CRAN, and actel didn't come up in CRAN's pre-release revdep testing. I see some
skip_on_cran()
in your tests but that skip doesn't appear on this one that's impacted. I think it's because none of the CRAN check machines sets TZ=UTC. I set TZ=UTC in my local revdep testing which is how I detected it.It's due to the potentially breaking change in this release, that
fread
now reads UTC-datetime as POSIXct directly. The datetime in your biometrics.csv, column 1, is unmarked though (i.e. no Z or UTC offset present), so it is still read as character for backwards compatibility. See note 1 in NEWS here: https://github.com/Rdatatable/data.table/blob/master/NEWS.mdHowever, when local time zone is set to UTC by user, fread reads biometrics.csv first column as POSIXct in UTC, and then the difference occurs later because your tests are very good.
I'm not sure it's necessary to make any change to
actel
, since it's passing on CRAN. But if you do want to change it to survive TZ=UTC, addingcolClasses=c("Release.date"="character")
would be the smallest change toloadBio
so that the code afterwards inloadBio
that works onRelease.date
as character, continues to work as before. Alternatively, you could use UTC in the csv files rather than local time, or you can passtz="UTC"
(a new argument tofread
in this release) to always read unmarked datetime as UTC.Sorry for the extra work this change causes you.
Best, Matt