Closed billdenney closed 2 years ago
What about vectorised use case?
In order to have this one would need a dedicated ISO8601 parser, which doesn't seem to be worth the effort for such a tiny use case.
Maybe I'm over-simplifying it, but I think that it could be a set of small functions with reasonably straight-forward regular expressions and grepl calls (therefore inherent vectorization).
I agree that a specialized parser would not be worthwhile.
If the grepl method doesn't seem like a good fit, no worries.
You might be right actually, but this is low priority. If you can put together a PR and a bunch of tests for it, I would be more than happy to include it in the code base.
The regular expressions become convoluted, but I am algorithmically building them in a way that makes them reasonable to review (e.g. make the year part then use that to make the date part then use that and the time part to make a whole regexp). And, I'm building many tests for each part, so that it should be understandable.
This is now a work in progress.
With a lot of work, I now have a super-regexp and the ability to generate all variants (optional second, minute, hour, day, week/month, year). The regexp itself is a beast:
(?:(158[3-9]|159[0-9]|1[6-9][0-9]{2}|[2-9][0-9]{3})(?:(?:-(0[1-9]|1[0-2])(?:-(?:(0[1-9]|[12][0-9]|3[01])(?:(?:(?:T([01][0-9]|2[0-3])|T([01][0-9]|2[0-3]):([0-5][0-9])(?::((?:[0-5][0-9])(?:[\.,][0-9]+)?))?)(?:(Z|\+00(?::00)?|[\+-]00:(?:15|30|45)|[\+-](?:0[1-9]|1[1-4])(?::(?:00|15|30|45))?))?)?)?))?|-W(0[1-9]|[1-4][0-9]|5[0-3])(?:-(?:([1-7])(?:(?:(?:T([01][0-9]|2[0-3])|T([01][0-9]|2[0-3]):([0-5][0-9])(?::((?:[0-5][0-9])(?:[\.,][0-9]+)?))?)(?:(Z|\+00(?::00)?|[\+-]00:(?:15|30|45)|[\+-](?:0[1-9]|1[1-4])(?::(?:00|15|30|45))?))?)?)?))?|(?:-(?:(00[1-9]|0[1-9][0-9]|[12][0-9]{2}|3[0-5][0-9]|36[0-6])(?:(?:(?:T([01][0-9]|2[0-3])|T([01][0-9]|2[0-3]):([0-5][0-9])(?::((?:[0-5][0-9])(?:[\.,][0-9]+)?))?)(?:(Z|\+00(?::00)?|[\+-]00:(?:15|30|45)|[\+-](?:0[1-9]|1[1-4])(?::(?:00|15|30|45))?))?)?)?))?))?)?
With this easier to review visualization.
The part that I'd prefer to be able to fix is making it so that time is only represented once. I think that look-ahead and look-behind regexps may be the right answer, but I don't understand enough about them yet to be sure that's correct.
Sorry for not coming back on this earlier. But I am afraid this is too complex. I am pretty shure there should be a C or C++ code somewhere to test for this. Otherwise it's probably not very difficult to write our own.
Yeah, it makes sense that this isn't a good fit as-is.
I have an application where I need to be able to detect if a character string is formatted as required by ISO 8601. Given that #629 / #700 to format date-times as ISO 8601 was a good fit here with
format_ISO8601()
, I thought that a detection method would also be useful here.What would you think about a function (or small family of functions) that was named something like
is_ISO8601()
which could do the following:x
the character string to test,representation
one or more ofc("date", "datetime", "time", "duration", "interval", "repeating interval")
,precision
to select the required precision of the argument (not applicable for"duration"
or the duration part of an"interval"
), andusetz
set toTRUE
to require the time zone,FALSE
to require no time zone, orNA
to allow with or without a time zone (would not apply for durations).representation
s or all of them.