wgc-hackathon / covid

Analysis of publicly available COVID-19 data to identify the next variant of concern.
GNU General Public License v3.0
4 stars 2 forks source link

UK heatmap of variants #7

Open bethsampher opened 3 years ago

bethsampher commented 3 years ago

Produce a heatmap of the UK to show prevalence of variants in different areas You will need to obtain data of the prevalence of different variants for each area of the UK and plot this on a map. You could even automate retrieving new data and incorporating it. If working in Python, you might want to use Folium. There is useful data for this as well as an existing heatmap to look at on Microreact

mathewcsims commented 3 years ago

Hi @bethsampher - do you know if there's an API anywhere with data that might work for this? Or where else I might grab the data from?

JamesABaker commented 3 years ago

@mathewcsims The first thing that comes to mind is the gov.uk site https://coronavirus.data.gov.uk/details/download and there is kind of an API (or at least a method to programmatically download) https://coronavirus.data.gov.uk/details/developers-guide#methods.

This site has a downloads page that might be of use for a more international perspective https://www.covid19dataportal.org/sequences?db=embl-covid19

There is also CORD-19, although I have not used this and it looks to be more about advance literature searching.

https://github.com/vespa-engine/cord-19/blob/master/cord-19-queries.md

Li is a webscraper, but might be fiddly to set up locally.

https://github.com/covidatlas/li/blob/master/docs/getting_started.md

We also have a bunch of potentially useful links here: https://github.com/wgc-hackathon/covid/issues/1 although I don't recall seeing any geographical data there at all.

Let me know if any of those work for what you want to look for.

mathewcsims commented 3 years ago

Thanks @JamesABaker :) What I was hoping to find was a breakdown of variants by locality so that I could try creating a heatmap similar to the Microreact one. That data must exist somewhere, but I can't find it. Not to worry, I'm sure I'll find something interesting to do with the data sources that I have found. It's a shame they don't include the variant data in the gov.uk site, but I imagine the sequencing would perhaps delay them being able to publich the numbers.

JamesABaker commented 3 years ago

@bethsampher will show a demo of how to get variant numbers in the workshop. This could be programmed via a very hacky web scrape...

bethsampher commented 3 years ago

Hi @mathewcsims , thanks for your comments! I've also been trying to find some data that splits the variant cases by area in the UK but haven't had any luck. Microreact splits them by 'submission_org_code' but I don't think that's really what we're after. You can find variants split by country on CoV-GLUE, which is what I'm using in the workshop, so you could do a world map. But, as you said, I'm sure you can get some other interesting insights from the UK gov data :)

mathewcsims commented 3 years ago

Thanks @bethsampher and @JamesABaker; I'll have a look at CoV-GLUE in the workshop, and then see what I can come up with from there.

mathewcsims commented 3 years ago

After a lot of digging around, @bethsampher, It looks as though Microreact includes iso_3166_code as a field, which I'm fairly sure corresponds to UK upper tier local authorities. So I think downloading their data file should do the trick. I should be able to dig around and find a reference for what the codes mean (Wikipedia, if all else fails).

JamesABaker commented 3 years ago

Really glad you found the key! Looking forward to seeing what you come up with :smiley:

mathewcsims commented 3 years ago

Thanks @JamesABaker, work in progress at https://github.com/mathewcsims/covid/tree/variant-map-percentages but one of my dataframes is causing me headaches (I need to calculate which week of the pandemic given days are in... which you wouldn't think would be so fiddly!). A challenge for tomorrow though.

bethsampher commented 3 years ago

Glad to see you're making progress @mathewcsims!