Add task to pull URLs and metadata in from GDELT

The GDELT Project (by Kalev Leetaru) offers a rich data feed under very open terms.

There is a feed file that is update every 15 minutes that links to a TSV in a ZIP file that be parsed and interpreted to generate a list of URLs with associated metadata (including lat/lon of event).

A task could grab this file, parse and push it into an appropriate crawl stream, if it appears to be in scope. If done with care, this could include some of the additional metadata and pass it along to the indexer.

ukwa / ukwa-manage

Add task to pull URLs and metadata in from GDELT #22