Closed alexhbnr closed 1 year ago
@alexhbnr could you provide more information - as far as I can see what you want is already implemented in dev
, as in the BCFtools consensus output (presumably with the corrected bases) is then passed to binning
via the 'contigs_recalled` channel name:
Can you provide an example somehow (or a reprex) where the wrong contigs were used downstream?
In case, it's already in the dev
branch and it seems to work, then that's what I want. In the major release v2.3.0 that I used the contig sequences found in the final MAGs were identical to the contigs found in the uncorrected samples. That's why I raised this issue.
That's concerning, I'm pretty sure when I looked earlier the code looked identical. Which mags were you looking at in the results directory (presumably)
Here is the code that I used to start the pipeline:
nextflow run nf-core/mag -r 2.3.0 \
-profile eva,archgen \
--input 04-analysis/Zape2/nfcore_mag_samplesheet.csv \
--outdir 04-analysis/Zape2/assembly \
--skip_clipping \
--skip_prodigal \
--binning_map_mode own \
--min_contig_size 1000 \
--gtdb false \
--binqc_tool checkm \
--save_checkm_data \
--refine_bins_dastool \
--postbinning_input both \
--run_gunc \
--gunc_save_db \
--ancient_dna
If I then do a simple pairwise comparison of the contig sequences that are found in a MAG in the folder GenomeBinning/DASTool/bins
with either the original sequence of SPAdes Assembly/SPAdes/SPAdes-Zape2_MinE2X_scaffolds.fasta.gz
and the consensus sequence returned from the ancientDNA
workflow, Ancient_DNA/variant_calling/consensus/Zape2_MinE2X.fa
, then diff
identifies mismatches when comparing it to the consensus sequence but not the original sequences returned by SPAdes.
So I guess in the version I was running, the contigs used in DASTool aren't the ones returned in the consensus
folder, which I assume are the corrected ones.
@maxibor will check my execution trace to see what's going on here.
Description of the bug
When selecting the ancient DNA sub-workflow using
--ancient_dna
, a correction of the consensus sequence reported by the assembler is performed. This step should remove artefacts that were wrongly called by the assembler due to the presence of ancient DNA damage.However, for genome binning, nf-core/mag doesn't select these corrected contigs for the binning but uses the non-corrected contigs instead. This defeats the purpose of enabling the
--ancient_dna
sub-workflow.Command used and terminal output
No response
Relevant files
No response
System information