houseabsolute / DateTime-Format-ISO8601

Parses ISO8601 formats
http://metacpan.org/release/DateTime-Format-ISO8601/
Other
3 stars 3 forks source link

String with TZ -0700 fails to parse, unless.. #7

Open autarch opened 3 years ago

autarch commented 3 years ago

Migrated from rt.cpan.org #108082 (status was 'open')

Requestors:

From cpan@desert-island.me.uk on 2015-10-29 14:38:23 :

From Facebook's Graph API: "2011-04-30T10:00:00-0700" fails to parse in v0.08:

perl -MDateTime::Format::ISO8601 -le'print DateTime::Format::ISO8601->new()->parse_datetime("2011-04-30T10:00:00-0700")'
Invalid date format: 2011-04-30T10:00:00-0700 at -e line 1.

If I change "length" in this section:

            {
                #YYYYMMDDThhmmss[+-]hhmm 19850412T101530+0400
                #YYYY-MM-DDThh:mm:ss[+-]hh:mm 1985-04-12T10:15:30+04:00
                length => [ qw( 20 25 ) ],
                regex  => qr/^ (\d{4}) -??  (\d\d) -?? (\d\d)
                            T (\d\d) :?? (\d\d) :?? (\d\d) ([+-] \d\d :?? \d\d) $/x,
                params => [ qw( year month day hour minute second time_zone ) ],
                postprocess => \&_normalize_offset,
            },

to:

                length => [ qw( 20 24 25 ) ],

then it parses fine..

autarch commented 3 years ago

From jhoblitt@cpan.org on 2015-10-29 16:03:17 :

On Thu Oct 29 10:38:23 2015, JROBINSON wrote:

From Facebook's Graph API: "2011-04-30T10:00:00-0700" fails to parse in v0.08:

Good catch. It looks like the \d{4} tz offset format somehow managed to escape being a test example.

https://github.com/jhoblitt/DateTime-Format-ISO8601/blob/master/t/02_examples.t

autarch commented 3 years ago

From fga@cpan.org (@fgabolde) on 2016-04-15 16:05:55 :

On Thu, 29 Oct 2015 12:03:17 -0400, JHOBLITT wrote:

On Thu Oct 29 10:38:23 2015, JROBINSON wrote:

From Facebook's Graph API: "2011-04-30T10:00:00-0700" fails to parse in v0.08:

Good catch. It looks like the \d{4} tz offset format somehow managed to escape being a test example.

https://github.com/jhoblitt/DateTime-Format- ISO8601/blob/master/t/02_examples.t

Is this ticket still valid? Back in 2012 you had rejected a similar report (see RT #52645).

Personally I'd also really like for this format to be supported for the following reasons:

autarch commented 3 years ago

From draxil@cpan.org on 2016-07-20 15:03:06 :

I've packaged up the discussed change as a pull request if that's useful to anyone.

https://github.com/jhoblitt/DateTime-Format-ISO8601/pull/1

autarch commented 3 years ago

From chansen@cpan.org (@chansen) on 2016-07-26 07:01:01 :

Vid Fre, 15 apr 2016 kl. 12.05.55, skrev FGA:

  • the ISO spec you quoted (which I don't have access to) is unclear on this particular topic, really

ISO 8601:2004 is very clear that the formatting should be either consistently in basic format or consistently in extended format.

Section 4.3.2 Complete representations, list every valid format and section 4.3.3 Representations other than complete says

the expression shall either be completely in basic format, in which case the minimum number of separators necessary for the required expression is used, or completely in extended format, in which case additional separators shall be used in accordance with 4.1 and 4.2

-- chansen

autarch commented 3 years ago

From srezic@cpan.org (@eserte) on 2018-08-03 15:30:35 :

On 2016-04-15 12:05:55, FGA wrote:

[...]

  • formatting the time part with colons and the offset without is a fairly common mutation, to the point where GNU date on my machine prints this:

    $ date --iso-8601=seconds 2016-04-15T18:03:13+0200

This seems to be fixed in later versions of GNU coreutils:

$ ssh cpansand@debian8.... 'date --iso-8601=seconds; date --version | head -1' 2018-08-03T15:29:03+0000 date (GNU coreutils) 8.23

$ ssh cpansand@debian9.... 'date --iso-8601=seconds; date --version | head -1' 2018-08-03T15:29:09+00:00 date (GNU coreutils) 8.26

autarch commented 3 years ago

From srezic@cpan.org (@eserte) on 2018-08-03 15:34:04 :

On 2016-07-26 03:01:01, CHANSEN wrote:

Vid Fre, 15 apr 2016 kl. 12.05.55, skrev FGA:

  • the ISO spec you quoted (which I don't have access to) is unclear on this particular topic, really

ISO 8601:2004 is very clear that the formatting should be either consistently in basic format or consistently in extended format.

Section 4.3.2 Complete representations, list every valid format and section 4.3.3 Representations other than complete says

the expression shall either be completely in basic format, in which case the minimum number of separators necessary for the required expression is used, or completely in extended format, in which case additional separators shall be used in accordance with 4.1 and 4.2

A possibility would be to provide a sloppy=>1 parameter either in the constructor or in the parse_* methods or in both.

At least it would be good to document these corner cases.

esabol commented 3 years ago

To paraphrase Postel's Law, "Be liberal in what you accept; be strict in what you produce." [-+]hhmm is fairly common even if it's not strictly adherent to ISO8601. I don't see any downside to parsing it correctly.

autarch commented 3 years ago

To paraphrase Postel's Law, "Be liberal in what you accept; be strict in what you produce." [-+]hhmm is fairly common even if it's not strictly adherent to ISO8601. I don't see any downside to parsing it correctly.

I can think of at least one downside. If you're using this parser to validate data that you then pass on to another system that is not as liberal, then making this parser more liberal could cause issues.

But I'm open to the idea of a "sloppy" mode as I mentioned before.

esabol commented 3 years ago

I see validation as a distinctly different operation from parsing. If your vision for this module includes supporting validation, then I propose the addition of a validate_datetime method which is always strictly adherent to the standard and which returns a boolean.

For the record, the docs for this module don't say anything about using it as a validator. They only mention parsing (and formatting, less prominently, now that that is supported).

My counterproposal to the sloppy option would be to have a strict option instead, but I concede that the sloppy option would be less likely to alter the functionality of existing code in a surprising manner. Either way, it would be nice if more liberal parsing were supported somehow.

Not that it matters a whole lot, but I feel "sloppy" has a negative connotation. Some alternatives: lenient, flexible, tolerant. I think I prefer "lenient", for what it's worth.