openeventdata / phoenix_pipeline

Turning news into events since 2014.
MIT License
50 stars 33 forks source link

Handling single dates in pipeline #10

Open philip-schrodt opened 10 years ago

philip-schrodt commented 10 years ago

We still need to get some experience in how those feeds come in re: dates. The last time I checked, the URLs in consecutive days were largely (but not completely) redundant. What we may want to do is provide an initial update, then after a couple of days go back, recode all of the available files (eliminating duplicate URLs) and then replace the records for that day in the "final" file. Any given run of oneaday_formatter.py then would process a single day rather than multiple days. What we want to avoid is having the file not be in chronological order: this creates a mess with various routines.