haskell / time

A time library
http://hackage.haskell.org/package/time
Other
120 stars 80 forks source link

Regression in TimeZone's Read instance #128

Closed Gipphe closed 4 years ago

Gipphe commented 4 years ago

A (possible) regression was introduced in 70cbad41. With time-1.8.0.2 in Stackage lts-14.27, date strings like "2020-01-01 00:00:00" could be parsed into UTCTimes with its Read instance. With time-1.9.2, in lts-15.7, this is no longer the case.

Update:

The regression seems to be in TimeZone's Read instance. See https://github.com/haskell/time/issues/128#issuecomment-615254004.

Steps to reproduce:

Using stack repl --no-load to start ghci.

Using GHCI on commit 70cbad41 and beyond:

Prelude> :load Data.Time.Clock
*Data.Time.Clock> read "2020-01-01 00:00:00" :: UTCTime
*** Exception: Prelude.read: no parse

While on 70cbad41's parent:

Prelude> :load Data.Time.Clock
*Data.Time.Clock> read "2020-01-01 00:00:00" :: UTCTime
2020-01-01 00:00:00 UTC

Discovery:

I noticed this regression bug when I transitioned a project of mine from Stackage lts-14.27 to lts-15.7. lts-14.27's time package is at version 1.8.0.2, while lts-15.7 is at 1.9.2. This is in a project where I use persistent, persistent-odbc, HDBC-odbc and HDBC to integrate with a Microsoft SQL Server (MSSQL) database in Microsoft Azure.

Some of my tables have DATETIME columns, which I've modeled as a UTCTime in Haskell. These columns now fail to parse into UTCTime when SELECTing from the tables in question.

persistent uses the PersistField class for parsing PersistValues (persistent's version of raw database values) into a corresponding type. It uses reads :: Read a => ReadS a (where type ReadS a = a -> [(a, String)]) to parse ByteStrings into UTCTime. MSSQL outputs DATETIMEs in the format %Y-%m-%d %H:%M:%S. This format used to be Readable, but no longer is.

Temporary solution for anyone else encountering this:

Wrap UTCTime in a newtype and write the PersistField instance yourself.

Should it be fixed?

I understand from #75 that the commit was intended to fix the inconsistency of %Z. Has time thus abandoned %Y-%m-%d %H:%M:%S as a valid format for its Read instance? If not, this is an issue on persistent; or possibly persistent-odbc, HDBC-odbc or HDBC; instead of time.

Gipphe commented 4 years ago

Looking a bit further into this, it is of course because UTCTime is parsed from ZonedTime, which is parsed from LocalTime and TimeZone. TimeZone is the culprit in this situation:

On parent:

*Data.Time.LocalTime> read "" :: TimeZone
+0000

On 70cbad4 and onwards:

*Data.Time.LocalTime> read "" :: TimeZone
*** Exception: Prelude.read: no parse

In short: +0000 is no longer the default time zone when parsing a TimeZone from a String. Is this correct and intended?

AshleyYakeley commented 4 years ago

I'm thinking that the empty string is not a time-zone, so "no parse" is correct for read @TimeZone "". And if you specify a %Z for parseTimeM, a time-zone ought to be there.

Perhaps read @UTCTime should accept but not require a time-zone? Not sure of the cleanest way of doing this.

Gipphe commented 4 years ago

I'm just gonna suggest a naïve approach as a starting point then. Just for reference, the current implementation looks like this:

instance Read UTCTime where
    readsPrec n s = [(zonedTimeToUTC t, r) | (t, r) <- readsPrec n s]

then how about

import Data.Time.LocalTime.Internal.TimeZone (utc)
import Data.Time.LocalTime (localTimeToUTC)

instance Read UTCTime where
    readsPrec n s =
        [(zonedTimeToUTC t, r) | (t, r) <- readsPrec n s]
            <|> [(localTimeToUTC utc t, r) | (t, r) <- readsPrec n s  ]

utilizing the Alternative instance of [] where [] <|> [x] = [x] and [x] <|> [y] = [x].

From my understanding, ZonedTime consists of a LocalTime and TimeZone. Since TimeZone is the reason ZonedTime fails to read from strings without time zones, how about just parsing the LocalTime itself if it fails to parse when expecting a time zone? And if there is no time zone, we default to UTC, which is what Read UTCTime did before 70cbad4 (unless I'm mistaken).

Again, this is just a naïve approach based on what part of ZonedTime failed to parse.

Update

The main issue I see with the approach above is that we end up parsing LocalTime twice if it fails to parse the TimeZone for ZonedTime. Thus, how about:

import Data.Time.LocalTime.Internal.TimeZone (TimeZone, utc)
import Data.Time.LocalTime (LocalTime, localTimeToUTC)

instance Read UTCTime where
    readsPrec n s = do
        (lt, s') <- readsPrec n s
        (tz, s'') <- readsPrec n s' <|> pure (utc, s')
        pure (localTimeToUTC tz lt, s'')

Granted, I have yet to test whether this even works, but I'll try it out now.

Update 2

Can confirm it works. The following program:

{-# LANGUAGE TypeApplications #-}
import Text.Read (readMaybe)
import Data.Time.Clock (UTCTime)
import Data.Time.LocalTime (utc, localTimeToUTC)
import Control.Applicative (Alternative(..))

newtype FooUTC = FooUTC UTCTime
    deriving (Show)

instance Read FooUTC where
    readsPrec n s = do
        (lt, s') <- readsPrec n s
        (tz, s'') <- readsPrec n s' <|> pure (utc, s')
        pure (FooUTC (localTimeToUTC tz lt), s'')

main = do
  putStrLn $ "With TZ: " ++ show withTz
  putStrLn $ "Without TZ: " ++ show withoutTz
  where
    withoutTz = readMaybe @FooUTC "2020-01-01 00:00:00"
    withTz = readMaybe @FooUTC "2020-01-01 00:00:00 +0100"

produces the following output:

root@fafd4634d9a6:/time# stack runghc Test.hs
With TZ: Just (FooUTC 2019-12-31 23:00:00 UTC)
Without TZ: Just (FooUTC 2020-01-01 00:00:00 UTC)
schoettl commented 4 years ago

I have a related issue – does it have the same origin than this?

The time zone string, e.g. "CEST" is parsed successfully but the zone minutes remain 0.

I also use persistent which saves ZonedTime with the time zone string "CEST" in the database. This way, I loose the time zone information.

Example:

    -- tz <- getCurrentTimeZone
    let tz = read "CEST" :: TimeZone
    print tz
    print $ timeZoneMinutes tz

Prints:

# with tz <- getCurrentTimeZone:
CEST
120
# with let tz = read "CEST":
CEST
0
AshleyYakeley commented 4 years ago

@schoettl that's not a bug, that's intended. The time library can't know all the time-zone abbreviations. See the documentation on this.

schoettl commented 4 years ago

OK, thanks! Then it's my fault to simply derivePersistField "ZonedTime". Looks like I've to find a more clever solution with "%z" instead.

AshleyYakeley commented 4 years ago

Done, using code provided.