COVID-19 Open Data Analysis To Identify The Next Variant Of Concern

Description

Scientists at the Wellcome Genome Campus have so far sequenced over 170,000 coronavirus genomes in order to track the spread of the virus and identify new variants. Only a small number of these variants pose a possible risk, due to mutations making them more virulent or transmissible, or potentially affecting the efficacy of the vaccine. The aim of this project is to use publicly available data to identify these so-called ‘Variants of Concern’. There are various ways you could do this, as well as a number of existing resources to make use of. Feel free to be as creative as you like in your methods and analysis. To that end, this project would be suitable for experienced programmers, but you don’t necessarily have to code to take part. There are plenty of other important ways you could contribute, such as using the existing tools to put together a report. The main requirement is an interest in using the available data to help identify, understand and communicate potential Variants of Concern.

Objective

COVID variants are cropping up all over the world. Some are showing resistance to vaccines and out-compete the original strains. This project aims to analyse several aspects of the COVID-19 pandemic using publicly available datasets with a focus on identifying or predicting the next variant of concern. This will help researchers get ahead of the virus.

Discussions and workshop

Throughout the hackathon, you can use Discussions to ask questions and we will be on hand to help (you're welcome to help each other too!). There will also be an opportunity to ask questions at our workshop, where we will be covering the biological concepts underpinning this project and going through how you might produce a report. Register for the workshop here.

Contributing

Data resources

The first step might be to look at the data already available openly to researchers. COG-UK have a great set of resources to explore and use in your hackathon research. It would be great if you could familiarise yourself with some of those pages, but they are not always clear so please raise any issues asking questions and we will be happy to help answering them.

Outcomes

We really do not want to push people in any specific direction if you have ideas on how to help scientists identify or predict variants of concern. Scientific report on any variants that worry you in the data? Sure! A workflow package to say what protein has been changed? Great! Machine learning on fully simulated virus with all possible changes? Good luck to you!

Of course these might seem like massive tasks, and you will have a lot of questions and that is how we can help out.

Predefined projects

Although we encourage you to develop your own novel methods that researchers may not have thought of, we have outlined some projects in GitHub issues that we think might be useful for the community. They are flagged to show if they require programming experience or not. Feel free to contribute to those projects if you don't want to do your own.

Biology background

There is a lot of complex biology involved in studying COVID. Below are some resources to help get you get to grips with the basics.

The genetics of Covid-19: Dr Katrina Lythgoe on mutations, variants and how to spot them. An interview with Dr Katrina Lythgoe who discusses researching the mutations and variants of coronavirus.
Kurgesagt on COVID. A general <10 minute intro to the coronavirus from the begining of the pandemic.
Ted-Ed on COVID. Another <5 minute intro to the coronavirus.
Covid variants: What happens when a virus mutates? - BBC News. A 2 minute intro to variants of concern.
Mayo Clinic Insights: What is a COVID-19 variant strain. This outlines the three main variants of concern currently known.
Sanger Institute - What does it take to sequence tens of thousands of COVID-19 samples? A short film showing the laboratories where the sequencing is carried out on campus.
Lecture 2: "Coronavirus biology" A lecture from a series of lectures from MIT on the coronavirus.

GitHub best practice

Once you have decided which issue you want to contribute to, follow the steps on this page. Note that step 9 is where you actually make changes to the data and code.

Coding and data science skills

If you want to brush up on code skills before getting started, we have started a link dump here for resources that help you code and perform data science.

wgc-hackathon / covid

readme