pharo-project / pharo-vm

This is the VM used by Pharo
http://pharo.org
Other
113 stars 68 forks source link

File modification time is wrong for half the year (especially in Linux) #801

Open daniels220 opened 4 months ago

daniels220 commented 4 months ago

If I have a file with a mod date when DST is not active, and when DST is active I ask a FileReference to that file for its modificationTime, the answer I get is an hour off. For instance:

$ touch -d '2024-02-01 12:34:56pm' nondst.txt
$ stat -c '%y' nondst.txt
2024-02-01 12:34:56.000000000 -0500
$ touch -d '2023-06-01 12:34:56pm' withdst.txt
$ stat -c '%y' withdst.txt
2023-06-01 12:34:56.000000000 -0400

Then in Pharo on Unix (DST is currently active):

'nondst.txt' asFileReference modificationTime. "=> 2024-02-01T12:34:56-04:00"
'withdst.txt' asFileReference modificationTime. "=> 2023-06-01T12:34:56-04:00"

and on Windows (same files—I used a network share):

'nondst.txt' asFileReference modificationTime. "=> 2024-02-01T13:34:56-04:00"
'withdst.txt' asFileReference modificationTime. "=> 2023-06-01T12:34:56-04:00"

In other words, Linux Pharo is using the current local offset for all times, even those to which a different offset applies, in such a way that it actually changes the UTC time. Meanwhile Windows Pharo is...well, Windows has the same bug of using the current offset for everything, so in this case Pharo is doing the best it can, I suppose. At least the equivalent UTC time is right, even if the offset is wrong.

Looking at the FileAttribute plugin code...I think there's legacy cruft going on here—file attributes are the only things that use DateAndTime class>>fromInternalTime:, and the current DateAndTime implementation uses a different representation such that calling that "internal time" is no longer really accurate. Probably the old implementation predates support for timezone offsets (and thus for acknowledging the difference between UTC and any kind of local time).

At this point, I think the most important thing to do is to have the primitive return the raw UTC seconds somehow, to ensure that is always correct. The simplest would be to have it return only that, and make the DateAndTime returned by modificationTime have an offset of 0. On Linux, we could also have it return a two-element array of {UTC seconds since UNIX epoch. Offset seconds from UTC (as determined by localtime(&unixTime)->tm_gmtoff)}. But on Windows this approach would still produce the wrong offset, as Windows has more-or-less this same bug. The only fully-correct solution would be to integrate a library like Chronos and provide timezone support in the image, but I don't know what state that codebase is in or if anyone is maintaining it (or is willing to).

On a related note, I see that not all implementations will include tm_gmtoff, in which case we use a fallback based on a magically-declared variables. But this is going to have the same problem of using the same offset all year rather than the correct one for whatever time of year it is. AFAICT all implementations should include a tm_isdst member, and if we replaced daylight with localtime(&unixTime)->tm_isdst it should then be accurate (on Linux, Windows again has the same bug internally).

Rinzwind commented 1 month ago

A larger issue here is that there’s no support for working with timezones that use daylight saving time. The offset of the first DateAndTime in the following is similarly wrong (as New York uses -05:00 in February, not -04:00):

$ TZ=America/New_York "${PHARO13VM:?}" --headless Pharo13.0-173.image eval "
  #(1706808896 1685637296)
    collect: [ :integer | DateAndTime fromUnixTime: integer ]"
an Array(2024-02-01T13:34:56-04:00 2023-06-01T12:34:56-04:00)

One possibility for adding such support could be to integrate the Chronos project found on SmalltalkHub, which allows the correct offset to be determined as follows (using Pharo 5 as the last ConfigurationOfChronos, timestamped 2016-05-13, says ‘works in Pharo 5’):

$ TZ=America/New_York "${PHARO5VM:?}" --headless Pharo-50772.image eval "
  ZnClient new
    url: 'https://web.archive.org/web/20110813143607/http://www.chronos-st.org/downloads/time-zones.zip';
    downloadTo: FileLocator imageDirectory / 'time-zones.zip'.
  ZipArchive new
    readFrom: FileLocator imageDirectory / 'time-zones.zip';
    extractAllTo: FileLocator imageDirectory.
  Metacello new
    configuration: 'Chronos';
    repository: 'http://smalltalkhub.com/mc/Chronos/Chronos/main';
    load.
  SqueakEnvironmentFacade class compile: 'valueOfEnvironmentVariableAt: envVarKey
    ^ OSEnvironment current at: envVarKey'.
  ChronosSystemFacade current setChronosSystemTimeZoneFromBaseSystem.
  Stdio stdout lf.
  (#(1706808896 1685637296)
    collect: [ :integer | DateAndTime fromUnixTime: integer ])
    collect: [ :dateAndTime | dateAndTime asChronosValue asLocal asNative ]"
[…]
## Chronos: System Timezone set to America/New_York | Eastern Time
an Array(2024-02-01T12:34:56-05:00 2023-06-01T12:34:56-04:00)
daniels220 commented 1 month ago

Yes, this is true. I work on a project that does use (a small subset of) Chronos to provide that support at the application level. It could be cool to see it integrated into Pharo, but I know it's very complicated...is anyone maintaining it?

As far as this bug report, my frustration here is that I'm getting the wrong UTC time, which is particularly egregious, and trivial to fix. IIRC it's decidedly nontrivial to back-calculate what the correct UTC time should be, as well, even with a proper timezone library.

I realize I didn't mention the option of just returning file times in UTC—TBH I think that might be the best compromise at this point. I'll edit the OP as well.

Rinzwind commented 1 month ago

The opensmalltalk-vm version of FileAttributesPlugin does support a flag that lets it return times in UTC, see opensmalltalk-vm commit 151793a3a6a5.

The only other references to Chronos in the ‘pharo-project’ repositories are in Pharo issue #2458 and pull request #2497, none of the links given there seem to point to a newer version than the one found on SmalltalkHub.