time-rs / time

The most used Rust library for date and time handling.
https://time-rs.github.io
Apache License 2.0
1.09k stars 273 forks source link

Parse date with ordinal #541

Closed wezm closed 1 year ago

wezm commented 1 year ago

For my project I need to be able to parse arbitrary dates as found on the internet. One such example that I haven't been able to work out how to parse with time uses this format:

 WARN  rsspls > unable to parse date 'Sunday, April 18th, 2021'
 WARN  rsspls > unable to parse date 'Friday, January 8th, 2021'
 WARN  rsspls > unable to parse date 'Tuesday, September 15th, 2020'
 WARN  rsspls > unable to parse date 'Monday, April 6th, 2020'
 WARN  rsspls > unable to parse date 'Friday, March 13th, 2020'
 WARN  rsspls > unable to parse date 'Sunday, June 2nd, 2019'
 WARN  rsspls > unable to parse date 'Wednesday, May 29th, 2019'
 WARN  rsspls > unable to parse date 'Saturday, May 25th, 2019'
 WARN  rsspls > unable to parse date 'Thursday, May 16th, 2019'
 WARN  rsspls > unable to parse date 'Tuesday, May 7th, 2019'
 WARN  rsspls > unable to parse date 'Friday, April 26th, 2019'
 WARN  rsspls > unable to parse date 'Tuesday, April 23rd, 2019'
 WARN  rsspls > unable to parse date 'Wednesday, April 17th, 2019'

It's the ordinals (th, rd, st) that are the challenge. It would be great if format description either supported parsing/formatting these or had a mechanism to specify characters to ignore, like: [weekday], [month repr:long] [day][ignore count:2], [year].

I can probably work around this for now by detecting digits followed by an ordinal and deleting the ordinal, happy to hear if there are other ideas.

jhpratt commented 1 year ago

Having a way to outright ignore characters could certainly be useful. In the meantime, once I get around to documenting the following, I will put out a release that includes it.

[weekday], [month repr:long] [day][first [st][nd][rd][th]], [year]

This would effectively do the same thing, taking the first of "st", "nd", "rd", and "th". In reality, which one is matched is irrelevant, as they're all literals.

wezm commented 1 year ago

This would effectively do the same thing, taking the first of "st", "nd", "rd", and "th". In reality, which one is matched is irrelevant, as they're all literals.

Cool, yes that would work well.

jhpratt commented 1 year ago

My previous suggestion was released a few days ago, so you should be able to use it now. I've also pushed up an implementation of [ignore count:X], though that's not yet released.

wezm commented 1 year ago

Thanks. I published a release using the git version, will be good to get back on a proper release.