open-contracting / kingfisher-collect

Downloads OCDS data and stores it on disk
https://kingfisher-collect.readthedocs.io
BSD 3-Clause "New" or "Revised" License
13 stars 12 forks source link

periodic_spider: support date range and update dominican_republic_api to use it #1052

Closed yolile closed 8 months ago

yolile commented 9 months ago

closes #1051

jpmckinney commented 9 months ago

I added more tests and found an edge case. Not sure why we removed a second. It's from https://github.com/open-contracting/kingfisher-collect/commit/3494079760e1a58a69b57246bce37bcf3479a0c3

If an end date was read in as having a HH:MM of 00:00, then the generator would skip a day.

jpmckinney commented 9 months ago

Actually - I'm not 100% sure whether the change to date_range_by_interval returns the correct results for all the contexts it's called from (panama_dgcp_bulk, paraguay_dncp_base, periodic_spider),

It depends whether the "end" time is inclusive or exclusive, e.g. when the step is 1, do we want pairs of days like Jan 1 to Jan 2, Jan 2 to Jan 3, etc. (end is exclusive) or Jan 1 to Jan 1, Jan 2 to Jan 2, etc. (end is inclusive).

paraguay_dncp_base is different, in that it formats the time as a full datetime (with seconds). The old logic treated the end time as inclusive (e.g. 23:59:59). With my update, it's now 00:00:00 on the next day.

We probably want an inclusive keyword argument to date_range_by_interval to return to correct values.

yolile commented 8 months ago

I added more tests and found an edge case. Not sure why we removed a second. It's from https://github.com/open-contracting/kingfisher-collect/commit/3494079760e1a58a69b57246bce37bcf3479a0c3

Hmm? That commit only changed the Paraguay spider, not date_range_by_interval, or are you referring to the Paraguay spider?

We probably want an inclusive keyword argument to date_range_by_interval to return to correct values.

Sounds good, I will implement that

jpmckinney commented 8 months ago

Hmm? That commit only changed the Paraguay spider, not date_range_by_interval, or are you referring to the Paraguay spider?

That commit was Paraguay specific, but when we extracted that code into date_range_by_interval, the new date_range_by_interval function copied that logic.

Edit: Basically, I git blame'd the date_range_by_interval function to figure out why we were subtracting 1 second, and traced it back to logic that was maybe specific to Paraguay.

yolile commented 8 months ago

Actually, I don't think we need the inclusive parameter for Paraguay. It is working well as it is. The from and until dates from the command line require the exact time as well anyway

yolile commented 8 months ago

@jpmckinney could you please review it again? thank you!