We still need to get some experience in how those feeds come in re: dates. The last time I checked, the URLs in consecutive days were largely (but not completely) redundant. What we may want to do is provide an initial update, then after a couple of days go back, recode all of the available files (eliminating duplicate URLs) and then replace the records for that day in the "final" file. Any given run of oneaday_formatter.py then would process a single day rather than multiple days. What we want to avoid is having the file not be in chronological order: this creates a mess with various routines.
We still need to get some experience in how those feeds come in re: dates. The last time I checked, the URLs in consecutive days were largely (but not completely) redundant. What we may want to do is provide an initial update, then after a couple of days go back, recode all of the available files (eliminating duplicate URLs) and then replace the records for that day in the "final" file. Any given run of oneaday_formatter.py then would process a single day rather than multiple days. What we want to avoid is having the file not be in chronological order: this creates a mess with various routines.