fix/csv queuer fetcher - Githubissues

Some preliminary comments:

The branch seems to contain other stuff (HIST_YEAR=2023 and add ELASTICSEARCH_SNAPSHOT_REPO)? It's best practice to have PRs based on main.
I think the pipeline type "csv-fetcher" is misleading; I suggest using "csv": It runs a different queuer that feeds into the (regular) queue-based fetcher and I think it's likely to be (re)used in the future, as opposed to being used only once.
In csv-queuer:
- I don't think any of the commented out code in csv-queuer.py needs to be kept around.
- the comment # let hist-fetcher quarantine if bad should go away
- the large block comment on urls_seen is no longer meaningful (since the CSV file is unlikely to have come from the legacy system), BUT filtering for duplicate urls is probably still a good idea (doesn't cost much, and can save time/effort).
- the comment block # content_metadata.parsed_date is not set, so parser.py will can go away
- rss.source_feed_id = None and rss.source_source_id = None can go away (should be the default value)
- rss.source_url = url can go away: we don't have the URL of the RSS file the story was found in.
in deploy.sh:
- PIPE_TYPE_PFX='hist-' I suggest use csv- and setting PORT_BIAS=600 to allow co-existence with other stacks
- Would be good to set "ARCH_SUFFIX=csv" so the generated WARC files are distinct from the current day ones

mediacloud / story-indexer