This PR is mainly to add decontamination, but there were a few other stuff added/fixed in it, which will be described here:
Decontamination of NTMs and human added. The decontamination DB is composed of the metadata, which is included in this repo in data/decontamination_db/remove_contam.tsv.gz, and the index, which is downloaded externally due to being 5.1GB. An error is shown if the decontamination DB is not present, with an EBI FTP URL from where to download it and where to put it. Several steps were added to tbpore pipeline to deal with decontamination;
A snakemake pipeline that runs tbpore on 91 samples and compare the consensus, BCFs and mykrobe output against H2H results added. This pipeline was initially private, but as I was using it frequently, and we will keep using it in the future, I think it is good to add it to the repo;
minimap2 downgraded to 2.22 to replicate H2H results;
Now also outputting the exact command line a tool was run to the error log file to make debugging easier;
mykrobe is now called on the decontaminated subsampled reads, instead of the raw ONT reads;
This PR is mainly to add decontamination, but there were a few other stuff added/fixed in it, which will be described here:
data/decontamination_db/remove_contam.tsv.gz
, and the index, which is downloaded externally due to being 5.1GB. An error is shown if the decontamination DB is not present, with an EBI FTP URL from where to download it and where to put it. Several steps were added totbpore
pipeline to deal with decontamination;snakemake
pipeline that runstbpore
on 91 samples and compare the consensus,BCF
s andmykrobe
output againstH2H
results added. This pipeline was initially private, but as I was using it frequently, and we will keep using it in the future, I think it is good to add it to the repo;minimap2
downgraded to2.22
to replicateH2H
results;mykrobe
is now called on the decontaminated subsampled reads, instead of the raw ONT reads;Closes #24 #23 #21 #16