purescript-contrib / purescript-formatters

Formatting and printing for numeric and date/time/interval values
Apache License 2.0
41 stars 29 forks source link

Parsing ISO 8601 / RFC 3339 datetime string? #44

Open Boscop opened 6 years ago

Boscop commented 6 years ago

What's the correct way to parse a ISO 8601 / RFC 3339 datetime string? This is very common in json communication. On the server side we are using Rust for our API and DateTime::to_rfc3339() to convert the datetimes to String for the json API, which can also be expressed with the format string "%+": > %+: Same to %Y-%m-%dT%H:%M:%S%.f%:z, i.e. 0, 3, 6 or 9 fractional digits for seconds and colons in the time zone offset.

So it has a variable number of digits for the fractional seconds, depending on the timestamp in question. If it falls on a second boundary, it has 0 fractional second digits, like "1970-01-01T00:00:00+00:00". Also it has the timezone at the end.

How can I parse this ISO 8601 / RFC 3339 datetime string in my PureScript frontend?

garyb commented 6 years ago

I think at the moment writing a parser using purescript-parsing or something like that is probably your best bet, as I guess the format language we have in here at the moment isn't expressive enough for that.

safareli commented 6 years ago

@Boscop you could build multiple formats and use unformatParser like this:

myParser 
  =  try (unformatParser format1)
 <|> try (unformatParser format2)
 <|> unformatParser format3

parse str = runParser str myParser 

Note, you might wanna use try, but it should be possible to order formats in such a way that it's not needed. actually I think you need try and it can't be avoided.

Also you can just use unformat and <|>:

parse str
  =  unformat format1 str
 <|> unformat format2 str
 <|> unformat format3 str
Boscop commented 6 years ago

@safareli Thanks. But I also need support for microseconds like "2017-11-21T05:16:29.120116+00:00" and it doesn't support that (only milliseconds): https://github.com/slamdata/purescript-formatters/blob/v3.0.0/src/Data/Formatter/DateTime.purs#L122 Would it be possible to add support for microseconds (6 digits) (and maybe nanoseconds (9 digits))? :)

Also, is there a way that I only have to parse the format string once at the first use, and then not on subsequent uses? With a lazy variable somehow?

garyb commented 6 years ago

There'll be a bit of a problem there since the DateTime representation that is being parsed/formatted is only millisecond-precise.

You could just create the format string at the top level and re-use it, then the parse cost is at startup. Lazy might well be another option. But I'd suggest constructing the format commands directly rather than using the string parsing method as another option: #22 🙂

Boscop commented 6 years ago

@garyb But how can I make it re-use the evaluated value? I currently do this:

fmt_rfc3339 = parseFormatString "YYYY-MM-DDTHH:mm:ss+00:00"
fmt_german = parseFormatString "DD.MM.YYYY, HH:mm"

humanTime s = either id id do
  decode <- fmt_rfc3339
  encode <- fmt_german
  datetime <- unformat decode s
  pure $ format encode datetime

Is that the most efficient way to do it?


There'll be a bit of a problem there since the DateTime representation that is being parsed/formatted is only millisecond-precise.

That's ok, it can round to the nearest millisecond.. Or even just truncate/ignore them. It should still be able to parse it though.. :)

safareli commented 6 years ago

Yes parseFormatString parses format string into Format value. if you are declaring format on top level you can also do this so if format was invalid for some reason you get an error on start up:

fmt_rfc3339 :: Format
fmt_rfc3339 = case parseFormatString "YYYY-MM-DDTHH:mm:ss+00:00" of
  Left err -> unsafeCrushWith $ "format must have been valid " <> show err
  Right x -> x
fmt_german :: Format
fmt_german = case parseFormatString "DD.MM.YYYY, HH:mm" of
  Left err -> unsafeCrushWith $ "format must have been valid " <> show err
  Right x -> x

humanTime s = either id id do
  datetime <- unformat fmt_rfc3339 s
  pure $ format fmt_german datetime

Also as @garyb noted you can just build this formats like this #22 and you woulnd't need the parseFormatString.

safareli commented 6 years ago

If you {nano,micro}seconds are in the end of the input string, and you are willing to play with parser combinatorics you can use unformatParser to get datetime and then discard rest of the string. (runPwhich use used to create unformat function adds eof parser to unformatParser)

vlatkoB commented 4 years ago

Would you accept a PR that adds formatters (UUU,MicrosecondsRounded) and (NNN,NanosecondsRounded)? Currently, I can't parse this: "2019-08-07T10:16:58.055246Z"

EDIT: Sign/constructor change to better reflect that rounding takes place