Write separate webscrapers for each agency, output into individual JSON files. Not crucial that we scrape every single metadata column - only requirement is that we get the pair of PANGO lineage + classification (i.e., B.1.1.7 + VOC)
Write a snakemake rule to combine results from each individual webscraper. Aim for a compact CSV format, where rows = PANGO lineages and columns = agencies, and cells = classification (VOC, VOI, etc).
This CSV can be exported into web-friendly JSON with .to_json(orient='records'). This will produce an array of {"lineage": "B.1.1.7", "WHO": "VOC", "CDC": "VOC", ...}
Separate snakemake rule for pulling PANGO lineage – WHO convention map. i.e., "B.1.617.2": "Delta"
For PHE, find the latest HTML report, go to that link, then scrape the table on the resulting page