Closed divergentdave closed 9 years ago
The SBA scraper has the same problem
Really? I would not expect this to be a common issue.
The easiest way to solve both issues would seem to be to specify an explicit sort order.
I can certainly see how it could happen, writing ORDER BY datetime DESC
or ORDER BY year DESC, month DESC, date DESC
seems like the sensible thing to do. I have a plan for retrying pages when we miss rows, going to try it out on SBA first.
Closed by #213 and #223
As described in #213, the USPS document library uses an unstable sort algorithm. If more than one report with the same date span a pagination boundary, we may see one report on both pages while missing another report entirely. We could probably detect this and re-fetch the offending pages until we make up the difference.