mmcdole / gofeed

Parse RSS, Atom and JSON feeds in Go
MIT License
2.59k stars 208 forks source link

Missing DateTime Parser Tests #119

Open mmcdole opened 5 years ago

mmcdole commented 5 years ago

We are missing comprehensive tests for the date/time parser.

Ideally, these would cover a myriad of date formats, including those with / without timezone, etc.

rodmcelrath commented 1 year ago

You know - I just added a new format, and put in a PR. And I went looking for the test. But didn't find one. I paused and thought about it. You could run all the formats through the parser. Then back out as formatted time - and you should get back the original format string.

But what does it prove? It's more of a test of the time package - which is well tested. I went delving into the time package test routines. To find a 'Format String' verifier. But they really didn't have one. And it's a complex format. Let me scratching my head.

mmcdole commented 1 year ago

I've been considering making date parsing optional, and perhaps having two degrees:

  1. Parse the formats specified by the specs (strict)
  2. Attempt to parse the myriad of date formats I have listed today.

I'm a bit concerned about the date parsing penalty people pay on parsing, when they may not even need/want it.

rodmcelrath commented 1 year ago

I for one lean heavily into what you have provided. I am dealing with nearly 700 feeds now.

The format I put in today, was the first failure I have seen on the date parsing. Might be others I have missed, but that is in place is pretty darned good.

Expensive to be sure, but worth it for me. As long as there is a way to engage it.
Another option might be to order the lists. Or further break them up and sample the incoming date string, and select a “group” of formats to test.

Given the other processing I do, those date formats are the least of my worries, if I get a valid date out the other side.

Here is a thought, throw a mutex in there, and perform a move to the front, on matching format string. Even in a multi threaded env, the hot matches will stay on top, and over the course of parsing one or more feeds simultaneously, you will only pay for the deep dive infrequently.