php / php-src

The PHP Interpreter
https://www.php.net
Other
38.01k stars 7.73k forks source link

`IntlDateFormatter` returns wrong output for 1940 date in Amsterdam timezone #10898

Open jhogervorst opened 1 year ago

jhogervorst commented 1 year ago

Description

The following code:

<?php

$timezone = new DateTimeZone("Europe/Amsterdam");
$date = new DateTime("1940-01-01 00:00:00", $timezone);
var_dump($date);

$formatter = new IntlDateFormatter("en-US");
var_dump($formatter->format($date));

Resulted in this output:

object(DateTime)#2 (3) {
  ["date"]=>
  string(26) "1940-01-01 00:00:00.000000"
  ["timezone_type"]=>
  int(3)
  ["timezone"]=>
  string(16) "Europe/Amsterdam"
}
string(69) "Sunday, December 31, 1939 at 11:40:00 PM Coordinated Universal Time"

But I expected this output instead:

object(DateTime)#2 (3) {
  ["date"]=>
  string(26) "1940-01-01 00:00:00.000000"
  ["timezone_type"]=>
  int(3)
  ["timezone"]=>
  string(16) "Europe/Amsterdam"
}
string(48) "Monday, January 1, 1940 at 12:00:00 AM GMT+00:20"

Additional details

Until 16 May 1940, The Netherlands used UTC+00:20 as timezone. It looks like IntlDateFormatter is not aware of the UTC+00:20 timezone, so it outputs the date in UTC.

The occurrence of the problem seems related to the version of the ICU library:

Environment Source PHP version ICU version Output
3v4l.org https://3v4l.org/UoVSs 8.2.4 57.1 ✅ As expected
Docker + Debian libicu Dockerfile 8.2.4 67.1 ❌ Wrong
Docker + latest libicu Dockerfile 8.2.4 72.1 ❌ Wrong

I am not familiar enough with the ICU library to assess whether the problem originates there. Since the wrong output is present in PHP when using the latest versions (of PHP and the ICU library), I am submitting the bug here.

PHP Version

8.2.4

Operating System

No response

heiglandreas commented 1 year ago

This is indeed a bug for the ICU4C library as the intl-extension is built on top of that. And the ICU-library uses it's own timezone-database. That is a constant problem as that depends on the underlying version of the ICU4C library.

See this comment

A DateTimeZone. Its identifier will be extracted and an ICU timezone object will be created; the timezone will be backed by ICUʼs database, not PHPʼs.

in https://www.php.net/manual/en/intlcalendar.settimezone.php

hormus commented 1 year ago

Set timezone Europe/Amsterdam

$formatter = new IntlDateFormatter("en-US",IntlDateFormatter::FULL, IntlDateFormatter::FULL,
    'Europe/Amsterdam',IntlDateFormatter::GREGORIAN );
<?php

$timezone = new DateTimeZone("Europe/Amsterdam");
$date = new DateTime("1940-01-01 00:00:00", $timezone);
var_dump($date);

$formatter = new IntlDateFormatter("en-US",IntlDateFormatter::FULL, IntlDateFormatter::FULL,
    $timezone,IntlDateFormatter::GREGORIAN );
var_dump($formatter->format($date));

Expected result:

Monday, January 1, 1940 at 12:00:00 AM GMT+00:20

IntlDateFormatter::format ---- If a DateTime or an IntlCalendar object is passed, its timezone is not considered. The object will be formatted using the formaterʼs configured timezone. If one wants to use the timezone of the object to be formatted, IntlDateFormatter::setTimeZone() must be called before with the objectʼs timezone. Alternatively, the static function IntlDateFormatter::formatObject() may be used instead.

jhogervorst commented 1 year ago

@hormus When running your code snippet in Docker + Debian libicu (ICU version 67.1), I’m still getting the wrong output:

Sunday, December 31, 1939 at 11:40:00 PM GMT
hormus commented 1 year ago

Hi @jhogervorst you can?

<?php

$timezone = new DateTimeZone("Europe/Amsterdam");
$date = new DateTime("1940-01-01 00:00:00", $timezone);
$var = $date->format('U'); // -946772400 and for natural GMT is -946771200

$formatter = new IntlDateFormatter("en-US",IntlDateFormatter::FULL, IntlDateFormatter::FULL,
    $timezone,IntlDateFormatter::GREGORIAN );
var_dump($formatter->format($var), $var);

Expected result:

string(48) "Monday, January 1, 1940 at 12:00:00 AM GMT+00:20"
string(10) "-946772400"
jhogervorst commented 1 year ago

@hormus

string(46) "Sunday, December 31, 1939 at 11:40:00 PM GMT"
string(10) "-946772400"
jorgsowa commented 1 year ago

Related news explaining the difference: https://www.redhat.com/en/blog/time-zone-database-package-tzdata-news-and-updates-2022

ICU library depends on TZ database (IANA), hence any change to it reflects in PHP.

Directly from TZ DB changelog:

    Finish moving to 'backzone' the location-based zones whose
    timestamps since 1970 are duplicates; adjust links accordingly.
    This change ordinarily affects only pre-1970 timestamps, and with
    the new PACKRATLIST option it does not affect any timestamps.
    In this round the affected zones are Antarctica/Vostok,
    Asia/Brunei, Asia/Kuala_Lumpur, Atlantic/Reykjavik,
    Europe/Amsterdam, Europe/Copenhagen, Europe/Luxembourg,
    Europe/Monaco, Europe/Oslo, Europe/Stockholm, Indian/Christmas,
    Indian/Cocos, Indian/Kerguelen, Indian/Mahe, Indian/Reunion,
    Pacific/Chuuk, Pacific/Funafuti, Pacific/Majuro, Pacific/Pohnpei,
    Pacific/Wake and Pacific/Wallis, and the affected links are
    Arctic/Longyearbyen, Atlantic/Jan_Mayen, Iceland, Pacific/Ponape,
    Pacific/Truk, and Pacific/Yap.