The pipeline may already be able to handle this, but what happens if both SE and PE are available for a particular strain? In Camilo's mixed euk dataset (mixed.csv), there is an SRA accession: SRR10432277 that fits this scenario.
I am getting an error when pipeline is at the read alignment step. It is trying to use both the PE and SE reads as inputs, but then I think this cause an issue.
Command used and terminal output
# An example of the error I see:
[83/984875] NOTE: Process `PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM (GCA_031834405_1_PHW726_fox_matthiolae)` terminated with an error exit status (1) -- Execution is retried (1)
ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM (GCA_031834405_1_PHW726_fox_matthiolae)'
Caused by:
Process `PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM (GCA_031834405_1_PHW726_fox_matthiolae)` terminated with an error exit status (1)
Command executed:
INDEX=`find -L ./ -name "*.amb" | sed 's/\.amb$//'`
bwa mem \
-M \
-t 16 \
$INDEX \
SRR10432277_1_subset.fastq.gz SRR10432277_2_subset.fastq.gz SRR10432277_subset.fastq.gz \
| samtools view --threads 16 -o GCA_031834405_1_PHW726_fox_matthiolae.bam -
cat <<-END_VERSIONS > versions.yml
"PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM":
bwa: $(echo $(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*$//')
samtools: $(echo $(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*$//')
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
-y INT seed occurrence for the 3rd round seeding [20]
-c INT skip seeds with more than INT occurrences [500]
-D FLOAT drop chains shorter than FLOAT fraction of the longest overlapping chain [0.50]
-W INT discard a chain if seeded bases shorter than INT [0]
-m INT perform at most INT rounds of mate rescues for each read [50]
-S skip mate rescue
-P skip pairing; mate rescue performed unless -S also in use
Scoring options:
-A INT score for a sequence match, which scales options -TdBOELU unless overridden [1]
-B INT penalty for a mismatch [4]
-O INT[,INT] gap open penalties for deletions and insertions [6,6]
-E INT[,INT] gap extension penalty; a gap of size k cost '{-O} + {-E}*k' [1,1]
-L INT[,INT] penalty for 5'- and 3'-end clipping [5,5]
-U INT penalty for an unpaired read pair [17]
-x STR read type. Setting -x changes multiple parameters unless overridden [null]
pacbio: -k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0 (PacBio reads to ref)
ont2d: -k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0 (Oxford Nanopore 2D-reads to ref)
intractg: -B9 -O16 -L5 (intra-species contigs to ref)
Input/output options:
-p smart pairing (ignoring in2.fq)
-R STR read group header line such as '@RG\tID:foo\tSM:bar' [null]
-H STR/FILE insert STR to header if it starts with @; or insert lines in FILE [null]
-o FILE sam file to output results to [stdout]
-j treat ALT contigs as part of the primary assembly (i.e. ignore <idxbase>.alt file)
-5 for split alignment, take the alignment with the smallest coordinate as primary
-q don't modify mapQ of supplementary alignments
-K INT process INT input bases in each batch regardless of nThreads (for reproducibility) []
-v INT verbosity level: 1=error, 2=warning, 3=message, 4+=debugging [3]
-T INT minimum score to output [30]
-h INT[,INT] if there are <INT hits with score >80% of the max score, output all in XA [5,200]
-a output all alignments for SE or unpaired PE
-C append FASTA/FASTQ comment to SAM output
-V output the reference FASTA header in the XR tag
-Y use soft clipping for supplementary alignments
-M mark shorter split hits as secondary
-I FLOAT[,FLOAT[,INT[,INT]]]
specify the mean, standard deviation (10% of the mean if absent), max
(4 sigma from the mean if absent) and min of the insert size distribution.
FR orientation only. [inferred]
Note: Please read the man page for detailed description of the command line and options.
[main_samview] fail to read the header from "-".
Work dir:
/home/marthasudermann/pathogensurveillance/work/a0/44cd71b6fd13a59faa6d3786d5301e
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
-- Check '.nextflow.log' file for details
Description of the bug
The pipeline may already be able to handle this, but what happens if both SE and PE are available for a particular strain? In Camilo's mixed euk dataset (mixed.csv), there is an SRA accession: SRR10432277 that fits this scenario.
I am getting an error when pipeline is at the read alignment step. It is trying to use both the PE and SE reads as inputs, but then I think this cause an issue.
Command used and terminal output
Relevant files
No response
System information
No response