ropensci / tidyhydat

An R package to import Water Survey of Canada hydrometric data and make it tidy
https://docs.ropensci.org/tidyhydat
Apache License 2.0
70 stars 19 forks source link

realtime_add_local_datetime converts date time incorrect #157

Closed nscott-grnland closed 3 years ago

nscott-grnland commented 3 years ago

realtime_add_local_datetime converts times for timezones incorrectly. When station_tz and tz_used are the same, Date and local_datetime are different.

Steps to reproduce the behavior:

  1. Run: rt <- realtime_dd(prov_terr_state_loc = "ON") realtime_add_local_datetime(rt, set_tz = NULL)
  2. Note where station_tz and tz_used both equal "America/Toronto", Date = 2021-05-24 05:00:00 and local_datetime = 2021-05-24 01:00:00

I expect Date and local_datetime to be the same if "America/Toronto" is being converted to "America/Toronto"

image

R version 4.0.4 (2021-02-15) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042) tidyhydat_0.5.2

boshek commented 3 years ago

:wave: @nscott-grnland,

Thanks for the report.

So this probably could be documented better but I think this behaviour makes sense. First when one grabs data that spans multiple time zones I think the only sensible approach is to use "UTC". That is the default for tidyhydat. Eg:

library(tidyhydat)
rt <- realtime_dd(prov_terr_state_loc = "ON")
attributes(rt$Date)$tzone
[1] "UTC"

In your example there are multiple timezones across Ontario. AFAIK it isn't possible (or advisable) to have different time zones in a single column in R. So what tidyhydat does is process all to one timezone with a warning (in this case "America/Toronto"). You can see that here all the case where station_tz differs from tz_used:

rt_added[rt_added$station_tz != rt_added$tz_used,]
  Queried on: 2021-06-25 16:00:16 (UTC)
  Date range: 2021-05-25 to 2021-06-24 
# A tibble: 962,268 x 11
   STATION_NUMBER PROV_TERR_STATE_LOC Date                station_tz         
   <chr>          <chr>               <dttm>              <chr>              
 1 02AB008        ON                  2021-05-25 05:03:00 America/Thunder_Bay
 2 02AB008        ON                  2021-05-25 05:08:00 America/Thunder_Bay
 3 02AB008        ON                  2021-05-25 05:13:00 America/Thunder_Bay
 4 02AB008        ON                  2021-05-25 05:18:00 America/Thunder_Bay
 5 02AB008        ON                  2021-05-25 05:23:00 America/Thunder_Bay
 6 02AB008        ON                  2021-05-25 05:28:00 America/Thunder_Bay
 7 02AB008        ON                  2021-05-25 05:33:00 America/Thunder_Bay
 8 02AB008        ON                  2021-05-25 05:38:00 America/Thunder_Bay
 9 02AB008        ON                  2021-05-25 05:43:00 America/Thunder_Bay
10 02AB008        ON                  2021-05-25 05:48:00 America/Thunder_Bay
# ... with 962,258 more rows, and 7 more variables: local_datetime <dttm>,
#   tz_used <chr>, Parameter <chr>, Value <dbl>, Grade <chr>, Symbol <chr>,
#   Code <chr>

So have deliberately left Date untouched making it easier to work across timezones. A couple thoughts:

Any other thoughts on how the documentation could be better written?

nscott-grnland commented 3 years ago

Thanks for responding so quickly.

If I understand you correctly, Date is being shown in "UTC" and station_tz is the timezone of the station's location, not the timezone being displayed in the Date column.

Now that you have clarified this for me, I re-read the documentation and it does make sense. I think what threw me off was in the realtime_add_local_datetime section, under 'Details', it states "Date from realtime_dd is supplied in UTC". I either skimmed the documentation too quickly or interpreted this to mean the underlying date was being supplied from UTC and then converted. It might prevent others from making the same mistake as me by adding a note in this section that mentions station_tz is the station's timezone and not the Date timezone. I know station_tz is listed in the realtime_dd section of the documentation but if you are only looking at the realtime_add_local_datetime section, it can be easily overlooked.

boshek commented 3 years ago

@nscott-grnland added some text here which hopefully makes things a bit clear: https://github.com/ropensci/tidyhydat/pull/158/commits/d231b2fd6b8700fa6ab848399950ab8ad3db900d