hroptatyr / dateutils

nifty command line date and time utilities; fast date calculations and conversion in the shell
http://www.fresse.org/dateutils/
Other
616 stars 42 forks source link

Multiple fall back input formats for dateconv #138

Open Earnestly opened 2 years ago

Earnestly commented 2 years ago

dateconv -S is particularly useful when used as a filter for a large amount of input. It would potentially be helpful when dealing with inputs that have a few known formats for dateconv to try each one in turn until it succeeds.

The alternative would be to execute dateconv (and perhaps strptime) for each line of input.

(The ultimate solution would be for dateconv to detect the format such as Date.parse from js or dateutils from python)

hroptatyr commented 2 years ago

Hi, thanks for the report. That's what -i|--input-format is for.

Earnestly commented 2 years ago

Oh I'm silly, I did not read properly that -i could be given multiple times. I'll have to give this a try

Earnestly commented 2 years ago

It doesn't appear to operate in a fallback manner, and attempts to apply each input format to every line instead of breaking after the first success.

I.e. given this input:

Sun, 26 Sep 2021 00:00:00 +1000 http://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html The Speed of Time

Currently dateconv will apply the %F input format to the url, which is fair enough as -S matches anything in the line.

% dateconv -Sf %FT%TZ -i %FT%T%Z -i '%a, %d %b %Y %T %Z' -i %FT%TZ -i '%d %b %Y %T %Z' -i %F
2021-09-25T14:00:00Z http://www.brendangregg.com/blog/2021-09-26T00:00:00Z/the-speed-of-time.html The Speed of Time

Ideally I would hope for something like this, where it breaks after the first success.

% dateconv -Sf %FT%TZ -i %FT%T%Z -i '%a, %d %b %Y %T %Z' -i %FT%TZ -i '%d %b %Y %T %Z' -i %F
2021-09-25T14:00:00Z http://www.brendangregg.com/blog/2021-09-26/the-speed-of-time.html The Speed of Time

But this is all heuristic and it seems like the only proper solution to this would be to support fields such as sort -k (and sort -t). Another might be to add "anchors" to the "general specs", so along with %n for newline, to perhaps have %a+ and %a- representing the regex anchors ^ and $.

To workaround this I've devised a scheme to ensure titles cannot contain tabs while inserting a tab between the date and the rest of the line. Then -i can include this tab via %t in the match and -f can re-insert the space. This seems to work consistently with my inputs.