replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 16 forks source link

Runs fail at barcode22 #223

Closed JaMaack closed 2 years ago

JaMaack commented 2 years ago

Hi, we recently had problems with some of our runs. The issue seems to be with the nanopolishing for barcode22 samples. After this, the runs stops with an error message and we have no idea why.

Command used

nextflow run replikation/poreCov --fastq_pass /home/minion/Sequenzierung/Analysis/poreCov/results/$name/demultiplexed_reads/one_end/ -r 1.3.0 --primerV $primers --update --cores 12  -profile local,docker --output /home/minion/Sequenzierung/Analysis/poreCov/results/$name/alignment_data/one_end/ -w /home/minion/Sequenzierung/Analysis/poreCov/results/$name/alignment_data/work/ --nanopolish /home/minion/rohdaten/$data_dir/no_sample/*/sequencing_summary*.txt --fast5 /home/minion/rohdaten/$data_dir/

Error code

Error executing process > 'artic_ncov_np_wf:artic_nanopolish (6)'

Caused by:
  Process `artic_ncov_np_wf:artic_nanopolish (6)` terminated with an error exit status (20)

Command executed:

  artic minion --minimap2 --normalise 500             --threads 12             --scheme-directory external_primer_schemes             --read-file barcode22_filtered.fastq.gz             --fast5-directory 20220328-01             --sequencing-summary sequencing_summary*.txt             nCoV-2019/VarSkipV2 barcode22

  # generate depth files
  artic_make_depth_mask --depth 20             --store-rg-depths external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta             barcode22.primertrimmed.rg.sorted.bam             barcode22.coverage_mask.txt

  zcat barcode22.pass.vcf.gz > SNP_barcode22.pass.vcf

  sed -i "1s/.*/>barcode22/" *.consensus.fasta

  # get reference FASTA ID to rename BAM
  REF=$(samtools view -H barcode22.primertrimmed.rg.sorted.bam | awk 'BEGIN{FS="\t"};{if($1=="@SQ"){print $2}}' | sed 's/SN://g')
  mv barcode22.primertrimmed.rg.sorted.bam barcode22_mapped_${REF}.primertrimmed.sorted.bam
  samtools index barcode22_mapped_${REF}.primertrimmed.sorted.bam

Command exit status:
  20

Command output:
  (empty)

Command error:
  [readdb] indexing 20220328-01
  [readdb] indexing 20220328-01/no_sample
  [readdb] indexing 20220328-01/no_sample/20220329_0943_MN36360_FAT41443_61a08759
  [readdb] indexing 20220328-01/no_sample/20220329_0943_MN36360_FAT41443_61a08759/fast5
  [readdb] indexing 20220328-01/no_sample/20220329_0943_MN36360_FAT41443_61a08759/other_reports
  [readdb] num reads: 21920, num reads with path to fast5: 21920
  [M::mm_idx_gen::0.001*1.27] collected minimizers
  [M::mm_idx_gen::0.002*3.21] sorted minimizers
  [M::main::0.002*3.19] loaded/built the index for 1 target sequence(s)
  [M::mm_mapopt_update::0.002*3.02] mid_occ = 3
  [M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
  [M::mm_idx_stat::0.002*2.88] distinct minimizers: 5587 (99.93% are singletons); average occurrences: 1.004; average spacing: 5.332
  [M::worker_pipeline::0.147*2.41] mapped 21920 sequences
  [M::main] Version: 2.17-r941
  [M::main] CMD: minimap2 -a -x map-ont -t 12 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta barcode22_filtered.fastq.gz
  [M::main] Real time: 0.149 sec; CPU: 0.356 sec; Peak RSS: 0.029 GB
  [post-run summary] total reads: 54, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 0
  [post-run summary] total reads: 34, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 0
  Running: nanopolish index -s sequencing_summary_FAT41443_4eceb848.txt -d 20220328-01 barcode22_filtered.fastq.gz
  Running: minimap2 -a -x map-ont -t 12 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta barcode22_filtered.fastq.gz | samtools view -bS -F 4 - | samtools sort -o barcode22.sorted.bam -
  Running: samtools index barcode22.sorted.bam
  Running: align_trim  --normalise 500 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed --start --remove-incorrect-pairs --report barcode22.alignreport.txt < barcode22.sorted.bam 2> barcode22.alignreport.er | samtools sort -T barcode22 - -o barcode22.trimmed.rg.sorted.bam
  Running: align_trim  --normalise 500 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed --remove-incorrect-pairs --report barcode22.alignreport.txt < barcode22.sorted.bam 2> barcode22.alignreport.er | samtools sort -T barcode22 - -o barcode22.primertrimmed.rg.sorted.bam
  Running: samtools index barcode22.trimmed.rg.sorted.bam
  Running: samtools index barcode22.primertrimmed.rg.sorted.bam
  Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 12 --reads barcode22_filtered.fastq.gz -o barcode22.nCoV-2019_1.vcf -b barcode22.trimmed.rg.sorted.bam -g external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1 
  Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 12 --reads barcode22_filtered.fastq.gz -o barcode22.nCoV-2019_2.vcf -b barcode22.trimmed.rg.sorted.bam -g external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_2 
  Running: artic_vcf_merge barcode22 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.primersitereport.txt nCoV-2019_1:barcode22.nCoV-2019_1.vcf nCoV-2019_2:barcode22.nCoV-2019_2.vcf
  Running: bgzip -f barcode22.merged.vcf
  Running: tabix -p vcf barcode22.merged.vcf.gz
  Running: artic-tools check_vcf --summaryOut barcode22.vcfreport.txt barcode22.merged.vcf.gz external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.vcfcheck.log
  Command failed:artic-tools check_vcf --summaryOut barcode22.vcfreport.txt barcode22.merged.vcf.gz external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.vcfcheck.log

Work dir:
  /home/minion/Sequenzierung/Analysis/poreCov/results/20220328-01/alignment_data/work/15/5de05280883000c5dd16c97d531a15

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Additional error code

[readdb] indexing 20220328-01
[readdb] indexing 20220328-01/no_sample
[readdb] indexing 20220328-01/no_sample/20220329_0943_MN36360_FAT41443_61a08759
[readdb] indexing 20220328-01/no_sample/20220329_0943_MN36360_FAT41443_61a08759/fast5
[readdb] indexing 20220328-01/no_sample/20220329_0943_MN36360_FAT41443_61a08759/other_reports
[readdb] num reads: 21920, num reads with path to fast5: 21920
[M::mm_idx_gen::0.001*1.27] collected minimizers
[M::mm_idx_gen::0.002*3.21] sorted minimizers
[M::main::0.002*3.19] loaded/built the index for 1 target sequence(s)
[M::mm_mapopt_update::0.002*3.02] mid_occ = 3
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 1
[M::mm_idx_stat::0.002*2.88] distinct minimizers: 5587 (99.93% are singletons); average occurrences: 1.004; average spacing: 5.332
[M::worker_pipeline::0.147*2.41] mapped 21920 sequences
[M::main] Version: 2.17-r941
[M::main] CMD: minimap2 -a -x map-ont -t 12 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta barcode22_filtered.fastq.gz
[M::main] Real time: 0.149 sec; CPU: 0.356 sec; Peak RSS: 0.029 GB
[post-run summary] total reads: 54, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 0
[post-run summary] total reads: 34, unparseable: 0, qc fail: 0, could not calibrate: 0, no alignment: 0, bad fast5: 0
Running: nanopolish index -s sequencing_summary_FAT41443_4eceb848.txt -d 20220328-01 barcode22_filtered.fastq.gz
Running: minimap2 -a -x map-ont -t 12 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta barcode22_filtered.fastq.gz | samtools view -bS -F 4 - | samtools sort -o barcode22.sorted.bam -
Running: samtools index barcode22.sorted.bam
Running: align_trim  --normalise 500 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed --start --remove-incorrect-pairs --report barcode22.alignreport.txt < barcode22.sorted.bam 2> barcode22.alignreport.er | samtools sort -T barcode22 - -o barcode22.trimmed.rg.sorted.bam
Running: align_trim  --normalise 500 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed --remove-incorrect-pairs --report barcode22.alignreport.txt < barcode22.sorted.bam 2> barcode22.alignreport.er | samtools sort -T barcode22 - -o barcode22.primertrimmed.rg.sorted.bam
Running: samtools index barcode22.trimmed.rg.sorted.bam
Running: samtools index barcode22.primertrimmed.rg.sorted.bam
Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 12 --reads barcode22_filtered.fastq.gz -o barcode22.nCoV-2019_1.vcf -b barcode22.trimmed.rg.sorted.bam -g external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_1 
Running: nanopolish variants --min-flanking-sequence 10 -x 1000000 --progress -t 12 --reads barcode22_filtered.fastq.gz -o barcode22.nCoV-2019_2.vcf -b barcode22.trimmed.rg.sorted.bam -g external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.reference.fasta -w "MN908947.3:1-29904" --ploidy 1 -m 0.15 --read-group nCoV-2019_2 
Running: artic_vcf_merge barcode22 external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.primersitereport.txt nCoV-2019_1:barcode22.nCoV-2019_1.vcf nCoV-2019_2:barcode22.nCoV-2019_2.vcf
Running: bgzip -f barcode22.merged.vcf
Running: tabix -p vcf barcode22.merged.vcf.gz
Running: artic-tools check_vcf --summaryOut barcode22.vcfreport.txt barcode22.merged.vcf.gz external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.vcfcheck.log
Command failed:artic-tools check_vcf --summaryOut barcode22.vcfreport.txt barcode22.merged.vcf.gz external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.vcfcheck.log

More Context

OPERATING_SYSTEM = Ubuntu 20.04.3 LTS CORES = 12 THREADS/CORE = 2 threads RAM = 16 GB RAM

replikation commented 2 years ago

since the error is from:

Running: artic-tools check_vcf --summaryOut barcode22.vcfreport.txt barcode22.merged.vcf.gz external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.vcfcheck.log
Command failed:artic-tools check_vcf --summaryOut barcode22.vcfreport.txt barcode22.merged.vcf.gz external_primer_schemes/nCoV-2019/VarSkipV2/nCoV-2019.scheme.bed 2> barcode22.vcfcheck.log

can you visually inspect if all the files from the artic-tools check_vcf are present in /home/minion/Sequenzierung/Analysis/poreCov/results/20220328-01/alignment_data/work/15/5de05280883000c5dd16c97d531a15

my first guess would be that maybe something is wrong with this file (e.g. not enough reads or something).

usually, we ignore these errors by default. i just noticed that this "ignore" was deactivated and I forgot to activate this so the workflow is not failing if some less good barcodes are fed into artic

JaMaack commented 2 years ago

The /home/minion/Sequenzierung/Analysis/poreCov/results/20220328-01/alignment_data/work/15/5de05280883000c5dd16c97d531a15 folder contains these files/folders (via ls command):


20220328-01                  barcode22_filtered.fastq.gz.index         barcode22.merged.vcf.gz      barcode22.nCoV-2019_2.vcf              barcode22.primertrimmed.rg.sorted.bam.bai  barcode22.trimmed.rg.sorted.bam.bai
barcode22.alignreport.er     barcode22_filtered.fastq.gz.index.fai     barcode22.merged.vcf.gz.tbi  barcode22.primersitereport.txt         barcode22.sorted.bam                       barcode22.vcfcheck.log
barcode22.alignreport.txt    barcode22_filtered.fastq.gz.index.gzi     barcode22.minion.log.txt     barcode22.primers.vcf                  barcode22.sorted.bam.bai                   external_primer_schemes
barcode22_filtered.fastq.gz  barcode22_filtered.fastq.gz.index.readdb  barcode22.nCoV-2019_1.vcf    barcode22.primertrimmed.rg.sorted.bam  barcode22.trimmed.rg.sorted.bam            sequencing_summary_FAT41443_4eceb848.txt
`
JaMaack commented 2 years ago

Looks like barcode22.vcfreport.txt isn't present.

replikation commented 2 years ago

okay probably not enough reads or coverage. the next release will ignore these errors again so it wont fail (is currently in the testing phase). for now I recommend excluding the barcode. We had a few exit 20 errors previously in older poreCov versions and it was always due to bad samples (reads too short, not enough coverage, too many human reads and not enough viral, etc.)

JaMaack commented 2 years ago

Is there some way to exclude a specific barcode via the command line?

replikation commented 2 years ago

@JaMaack not really. the new porecov release will be out today in a few hours. this would automatically ignore and skip this barcode. if you want it now you can update porecov and run the latest master:


# update
nextflow pull replikation/poreCov
# run poreCov via master
nextflow run replikation/poreCov -r master    <your commands>
JaMaack commented 2 years ago

Thanks, I'll update poreCov and repeat the runs later.

replikation commented 2 years ago

1.4.1 available now