mbhall88 / tbpore

Mycobacterium tuberculosis genomic analysis from Nanopore sequencing data
MIT License
11 stars 2 forks source link

Adding decontamination and a few other misc stuff #26

Closed leoisl closed 2 years ago

leoisl commented 2 years ago

This PR is mainly to add decontamination, but there were a few other stuff added/fixed in it, which will be described here:

  1. Decontamination of NTMs and human added. The decontamination DB is composed of the metadata, which is included in this repo in data/decontamination_db/remove_contam.tsv.gz, and the index, which is downloaded externally due to being 5.1GB. An error is shown if the decontamination DB is not present, with an EBI FTP URL from where to download it and where to put it. Several steps were added to tbpore pipeline to deal with decontamination;
  2. A snakemake pipeline that runs tbpore on 91 samples and compare the consensus, BCFs and mykrobe output against H2H results added. This pipeline was initially private, but as I was using it frequently, and we will keep using it in the future, I think it is good to add it to the repo;
  3. minimap2 downgraded to 2.22 to replicate H2H results;
  4. Now also outputting the exact command line a tool was run to the error log file to make debugging easier;
  5. mykrobe is now called on the decontaminated subsampled reads, instead of the raw ONT reads;

Closes #24 #23 #21 #16

leoisl commented 2 years ago

Code coverage decreased by 0.56% but I can confirm is nothing to worry about!