ocaml-community / ISO8601.ml

Parser and printer for date-times in ISO8601
https://ocaml-community.github.io/ISO8601.ml
MIT License
29 stars 13 forks source link

Parsing and unparsing leads to 1h time shift? #17

Open lindig opened 4 years ago

lindig commented 4 years ago

I assume I'm doing something wrong here but I am baffled by this behaviour:

utop # ISO8601.Permissive.datetime_tz "2020-02-14T06:56:42Z";;
- : float * float option = (1581667002., Some 0.)

utop # ISO8601.Permissive.datetime_tz "2020-02-14T06:56:42Z" |> fst |> ISO8601.Permissive.string_of_datetime;;
- : string = "2020-02-14T07:56:42"

The time printed is shifted by one hour from the time that was parsed. How would I prevent that?

lindig commented 4 years ago

The timestamp 1581667002 refers to Friday, 14 February 2020 07:56:42 GMT+00:00. So it appears to me that the unparsing (printing) is correct but that the parsing of 2020-02-14T06:56:42Z = 1581667002 is not correct.

c-cube commented 4 years ago

Is that on master? I think on master there's some changes to make it correct in #5 which has been rebased, but it caused a (small) regression in toml. Could you try on #5?

lindig commented 4 years ago

This was not on master but on 0.2.6. I'll try on master.

lindig commented 4 years ago

I believe this happens on master as well. Running the tests on master (on macOS) I also see a test failure:

:ISO8601.ml $ make test
Done: 48/50 (jobs: 1).........................................................F...
==============================================================================
Failure: ISO8601:2:[UTILS]:1:[UTILS mkdate]:4:583804800.

expected: 583804800. but got: 583808400.
------------------------------------------------------------------------------
Ran: 61 tests in: 0.04 seconds.
FAILED: Cases: 61 Tried: 61 Errors: 0 Failures: 1 Skip:  0 Todo: 0 Timeouts: 0.
ISO8601_TEST alias tests/runtest (exit 1)

I tried to use the branch from https://github.com/ocaml-community/ISO8601.ml/pull/5 but it is quite out of date (using ocamlbuild). So far did not try it.

c-cube commented 4 years ago

Ah my bad, what about wip-merge-pr-5 (which is the rebase of #5)?

lindig commented 4 years ago

On branch wip-merge-pr-5 the issue no longer exists. Running make test I see some test failures.

utop # ISO8601.Permissive.datetime_tz "2020-02-14T06:56:42Z" |> fst |> ISO8601.Permissive.string_of_datetime;;
- : string = "2020-02-14T06:56:42"
c-cube commented 4 years ago

Interesting, I don't see a test error on my end. I'd like to merge this at some point but am not confident enough in its correctness. Frankly, help would be very appreciated.

undu commented 4 years ago

Tests only work for me on branch wip-merge-pr-5 when adding --profile=release to dune runtest

lindig commented 4 years ago

This looks suspicious (on master):

src/ISO8601_lexer.mll:    let offset = fst (Unix.mktime (Unix.gmtime 0. )) 

This computes an offset from the date of the epoch and applies it. For me this is

utop # 
fst (Unix.mktime (Unix.gmtime 0.));;
- : float = -3600.

I don't understand what it tries to do.

c-cube commented 4 years ago

well, me neither. I think it's different on #5 ?

undu commented 4 years ago

What's the timezone of your system set to?

val mktime : tm -> float * tm

Convert a date and time, specified by the tm argument, into a time in seconds, as returned by Unix.time. The tm_isdst, tm_wday and tm_yday fields of tm are ignored. Also return a normalized copy of the given tm record, with the tm_wday, tm_yday, and tm_isdst fields recomputed from the other fields, and the other fields normalized (so that, e.g., 40 October is changed into 9 November). The tm argument is interpreted in the local time zone.

lindig commented 4 years ago
: tmp $ timedatectl 
               Local time: Thu 2020-02-20 17:02:47 GMT
           Universal time: Thu 2020-02-20 17:02:47 UTC
                 RTC time: Thu 2020-02-20 17:02:47
                Time zone: Europe/London (GMT, +0000)
System clock synchronized: yes
              NTP service: inactive
          RTC in local TZ: no
undu commented 4 years ago

the Stdlib does a system call, don't see anything egregious at first sight.

sagotch commented 4 years ago

mktime is interpreted in local timezone, so this is meant to compute the offset between local timezone and UTC, and suppress this offset from the given date in order to get the UTC time.

On my computer, I am at GMT+1, do I have to add 3600 seconds to get the correct UTC time from mktime.

$ timedatectl
               Local time: Fri 2020-02-21 09:32:22 CET
           Universal time: Fri 2020-02-21 08:32:22 UTC
                 RTC time: Fri 2020-02-21 08:32:22
                Time zone: Europe/Paris (CET, +0100)
System clock synchronized: yes
              NTP service: active
          RTC in local TZ: no
utop # fst (Unix.mktime (Unix.gmtime 0.));;
- : float = -3600.

EDIT: In #5, I see that it has been removed. It would be nice to test the library in different time zones. It might give good results in UK but false ones in other places.
https://github.com/ocaml-community/ISO8601.ml/pull/5/files#diff-4c4ab45718b4810726f53bea3f21b0a9R6

lindig commented 4 years ago

As you will have noticed, I'm in GMT/UTC and still get the same -3600 offset.

sagotch commented 4 years ago

As you will have noticed, I'm in GMT/UTC and still get the same -3600 offset.

Yes indeed, but according to what documentation says, you should not. Unless I misunderstood the documentation.

lindig commented 4 years ago

I had asked @undu and he sees it on his system (also in GMT/UTC) as well. And given that this is coming from a system call, I am reluctant to believe that this is a bug in Unix.gmtime.

kit-ty-kate commented 4 years ago

A similar issue seems to appear during the tests of toml.5.0.0 (which uses this library) on opam-repository's CI.

- expected: title = "TOML Example"
[...]
- dob = 1979-05-27T08:32:00+00:00
[...]
- 
- but got: title = "TOML Example"
[...]
- dob = 1979-05-27T07:32:00+00:00
[...]

From what I understand of the discussion it seems to be a known issue so I'm just going to disable the tests for this package.

darrenldl commented 3 years ago

Heard about this issue from Discord and made some comments over there, thought I'd copy the gist of my observation over here.

My speculation is that line 6 let offset = fst (Unix.mktime (Unix.gmtime 0. )) in in ISO8601_lexer.mll fails to take into account whether the local time zone was observing DST at the Unix epoch time, thus resulting in off-by-one-hour "base" offset calculation. This would seem to match @lindig's experience with Europe/London time zone, which observes DST for the entirety of 1970.

A straightforward fix would be to tune said offset calculation according to whether DST was being observed at the time of Unix epoch. But this would still assume the base offset remains unchanged over the years, which might hold for contemporary periods, but I'll need to verify.

EDIT: I haven't read any of the PR's in detail, so I'm not sure if there's already a fix somewhere.

darrenldl commented 3 years ago

My above observation as it turns out was somewhat getting close, but not entirely correct.

I made some code to test if any of the time zones observe changed "base" offset, and a lot of them failed, which includes Europe/London:

let l =
  let open Timere in
  let unix_epoch = 0L in
  (* Time_zone.available_time_zones *)
  [ "Europe/London" ]
  |> List.map (fun name ->
      let tz = Time_zone.make_exn name in
      (name,
       Time_zone.Raw.to_transitions tz
       |> List.filter (fun ((start, end_exc), _entry) ->
            unix_epoch < end_exc
          )
       |> List.map (fun ((start, end_exc), Time_zone.{ is_dst; offset }) ->
            (* Printf.printf "start: %Ld, end_exc: %Ld, is_dst: %b, offset: %d\n" start end_exc is_dst offset; *)
            offset - if is_dst then 3600 else 0
          )
       |> List.sort_uniq compare
      )
     )
  |> List.filter (fun (name, offsets) -> List.length offsets > 1)

gives

[("Europe/London", [0; 3600])]

indicating two unique "base" offsets.

In other words, a "base" offset does not seem to be necessarily well defined even if we only start at Unix epoch (1970 Jan 1 00:00:00 UTC). Examining the incongruence further, we can indeed observe that Europe/London "breaks" the link between isdst, base offset, and actual offset during the period around 1970 (not always the case that actual offset = base offset + if isdst then 3600 else 0):

via zdump command (lines of interest are marked with <== on the right hand side)

$ zdump -V -c 1968,1974 Europe/London
Europe/London  Sun Feb 18 01:59:59 1968 UT = Sun Feb 18 01:59:59 1968 GMT isdst=0 gmtoff=0
Europe/London  Sun Feb 18 02:00:00 1968 UT = Sun Feb 18 03:00:00 1968 BST isdst=1 gmtoff=3600
Europe/London  Sat Oct 26 22:59:59 1968 UT = Sat Oct 26 23:59:59 1968 BST isdst=1 gmtoff=3600
Europe/London  Sat Oct 26 23:00:00 1968 UT = Sun Oct 27 00:00:00 1968 BST isdst=0 gmtoff=3600 <==
Europe/London  Sun Oct 31 01:59:59 1971 UT = Sun Oct 31 02:59:59 1971 BST isdst=0 gmtoff=3600 <==
Europe/London  Sun Oct 31 02:00:00 1971 UT = Sun Oct 31 02:00:00 1971 GMT isdst=0 gmtoff=0
Europe/London  Sun Mar 19 01:59:59 1972 UT = Sun Mar 19 01:59:59 1972 GMT isdst=0 gmtoff=0
Europe/London  Sun Mar 19 02:00:00 1972 UT = Sun Mar 19 03:00:00 1972 BST isdst=1 gmtoff=3600
Europe/London  Sun Oct 29 01:59:59 1972 UT = Sun Oct 29 02:59:59 1972 BST isdst=1 gmtoff=3600
Europe/London  Sun Oct 29 02:00:00 1972 UT = Sun Oct 29 02:00:00 1972 GMT isdst=0 gmtoff=0
Europe/London  Sun Mar 18 01:59:59 1973 UT = Sun Mar 18 01:59:59 1973 GMT isdst=0 gmtoff=0
Europe/London  Sun Mar 18 02:00:00 1973 UT = Sun Mar 18 03:00:00 1973 BST isdst=1 gmtoff=3600
Europe/London  Sun Oct 28 01:59:59 1973 UT = Sun Oct 28 02:59:59 1973 BST isdst=1 gmtoff=3600
Europe/London  Sun Oct 28 02:00:00 1973 UT = Sun Oct 28 02:00:00 1973 GMT isdst=0 gmtoff=0

via TZ=Europe/London utop

utop # Unix.mktime (Unix.gmtime 0.);;
- : float * Unix.tm =
(-3600.,
 {Unix.tm_sec = 0; tm_min = 0; tm_hour = 0; tm_mday = 1; tm_mon = 0;
  tm_year = 70; tm_wday = 4; tm_yday = 0; tm_isdst = false})

So at least for Europe/London specifically, checking for isdst in base offset calculation would not have fixed the issue.

Namely we have a case where sampling at Unix epoch does not yield a useful base offset.

I'll add that my initial observation was based on information from https://www.timeanddate.com/time/change/uk/london where it reads "DST observed all year" for 1969 and 1970, but this doesn't seem to line up properly with the isdst flag (unless I'm misinterpeting the outputs attached above).

darrenldl commented 3 years ago

A small update:

I've patched ISO8601 to use Timere on my fork at try-timere branch https://github.com/darrenldl/ISO8601.ml/tree/try-timere , and it passes all 57 tests.

Timere does bring in a lot more complexity and dependencies, so I am not entirely sure if pushing toward using Timere under the hood is necessarily a good option for all users, but it doesn't seem to be a bad tradeoff if having difficult to resolve bugs is the only other option.

lindig commented 2 years ago

The actual Unix mktime call takes 3 possible values for the tm_isdst field: 0, 1, or -1. Looking at the OCaml bindings, the bool value from Unix.mktime is ignored (but at some time in the past wasn't):

This causes the Unix implementation to guess DST, which is almost always possible except for the hour when DST comes in or out of effect.

I believe we should not guess any offset but just accept the result of Unix.mktime as the correct timestamp for the argument. Note that Unix.mktime returns an updated tm structure in addition to the timestamp.

lindig commented 2 years ago

I suspect there is some misconception about the Unix epoch at play here: On a Linux system the timestamp for Jan 01 1970 is not zero, as we (and the unit tests) expect:

lindig@pi:~ $ cat /etc/timezone 
Europe/London
lindig@pi:~ $ date -d 'Jan 01 1970 00:00:00' +%s
-3600

This agrees with what I see on macOS as well. The timestamp at the epoch is not zero.

lindig commented 2 years ago

The failure of the unit test can be easily reproduced (on macOS for me, on two systems):

TZ=Europe/London make test

There is no unit test failure for TZ=US/Eastern, TZ=US/Central, or TZ=Europe/Berlin. Is Europe/London failing universally or could this be a macOS problem?

Looking at @darrenldl's comment I agree that looking at the UTC offset at timestamp 0 is not helpful because it is wrong for London and this is coming from the underlying time zone information in the OS and not the OCaml implementation.

Europe/London  Sat Oct 26 23:00:00 1968 UT = Sun Oct 27 00:00:00 1968 BST isdst=0 gmtoff=3600
Europe/London  Sun Oct 31 01:59:59 1971 UT = Sun Oct 31 02:59:59 1971 BST isdst=0 gmtoff=3600
Europe/London  Sun Oct 31 02:00:00 1971 UT = Sun Oct 31 02:00:00 1971 GMT isdst=0 gmtoff=0