This repository contains some additional analyses by Jesse Bloom related to metagenomic samples from the Huanan Seafood Market. The paper corresponding to this analysis is Bloom, Virus Evolution, vead089.
It builds on the analysis described in the earlier paper Bloom (2023) and contained in the repo for that paper at https://github.com/jbloom/Huanan_market_samples
Specifically, this repository also analyzes the number of viral reads mapping to several other coronavirus species found in animals in the market as noted by Crits-Christoph et al, bioRxiv (2023).
Interactive plots summarizing the key results about the mapping of reads to other coronaviruses are shown at https://jbloom.github.io/Huanan_market_samples_addtl_analysis
This repository contains a fully reproducible snakemake
analysis in Snakefile, with the configuration in config.yaml.
To run the pipeline first build the conda environment in environment.yml, and then run the pipeline.
In addition to the aforementioned interactive plots, the following output files are also tracked in this repo:
The following input data are used:
Briefly, the pipeline consists of the following steps:
Download all the metagenomic sequencing FASTQ files from the GSA and pre-process the reads
Get the accessions of all coronaviruses with reads mapping to them in the analysis of Crits-Christoph et al, bioRxiv (2023), and trim the 3' polyA tails
Align all the sequencing data to all the coronavirus genomes
Tabulate the read counts for each sample against each coronavirus, using the mapping quality and read length / identity cutoffs specified in config.yaml
Merge the coronavirus read counts with the species read counts previously reported in Bloom (2023) and contained in the repo for that paper at https://github.com/jbloom/Huanan_market_samples
Make interactive summary plots for all samples.
Copy the plots to the docs directory and make a HTML index for displaying on GitHub pages at https://jbloom.github.io/Huanan_market_samples_addtl_analysis