Closed masudermann closed 6 months ago
Upon closer inspection, it seems the nematode sample may have been very contaminated to start.
The peronospora input data was actually RNA-seq data, not genomic DNA, which could help explain the bizarre classification.
We are rerunning this dataset with a few more samples. I think as long as there isn't a stringent filtering of bbsketch results, that these were data specific not pipeline specific concerns.
Description of the bug
Input test dataset was 'mixed.csv'
Pipeline ran as expected (though it errored out when it couldn't handle the fusarium sample, which had both PE and SE raw reads downloaded-reported perviously).
At spades step, it tried to assemble downloaded reads (accession ERR3842626), and stated it ran out of memory. We don't expect an assembly for this sample.
Upon closer inspection, I realized that this sample was misclassified as a bacterium during the sendsketch initial classification. (Likely some reads were contaminated).
What is concerning is that the two classifications below were the only ones given:
Pseudomonas sp. Irchel 3H3 Pseudomonas sp. Irchel s3h17
How are Sendsketch filtered and then reported?
I worry, especially for eukaryotic samples, that if we filter these sendsketch results to just include top hits, the relevant hit won't be included. I've noticed at times, if a strain has some contamination (or there is misclassification due to limited database resources), the proper assignment (sometimes just to genus level) isn't until later in the files.
Command used and terminal output
Relevant files
xecutor > local (39) [6d/2b8e9d] process > PATHOGENSURVEILLANCE:INPUT_CHECK:SAMPLESHEET_CHECK (mixed.csv) [100%] 1 of 1, cached: 1 ✔ [c2/9657d6] process > PATHOGENSURVEILLANCE:SRATOOLS_FASTERQDUMP (HoneyBee_Adorsata) [100%] 6 of 6, cached: 6 ✔ [- ] process > PATHOGENSURVEILLANCE:DOWNLOAD_ASSEMBLIES - [- ] process > PATHOGENSURVEILLANCE:SEQKIT_SLIDING - [bb/7f2ba1] process > PATHOGENSURVEILLANCE:FASTQC (Rsol_Rsolanacearum) [100%] 6 of 6, cached: 6 ✔ [69/fc7c81] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:BBMAP_SENDSKETCH (PpalZOC03_Ppalmivora) [100%] 6 of 6, cached: 6 ✔ [0d/1b93c6] process > PATHOGENSURVEILLANCE:COARSE_SAMPLE_TAXONOMY:INITIAL_CLASSIFICATION (BDM_Pbelbahrii) [100%] 6 of 6, cached: 6 ✔ [4d/9abcaa] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:FIND_ASSEMBLIES (Megachilidae) [100%] 20 of 20, cached: 20 ✔ [b9/ce65c6] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:PICK_ASSEMBLIES (BDM_Pbelbahrii) [100%] 6 of 6, cached: 6 ✔ [25/baf51b] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:DOWNLOAD_ASSEMBLIES (GCA_022627115_1) [100%] 91 of 91, cached: 90 ✔ [c2/f70553] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:MAKE_GFF_WITH_FASTA (GCA_022627115_1) [100%] 91 of 91, cached: 90 ✔ [26/b96e84] process > PATHOGENSURVEILLANCE:DOWNLOAD_REFERENCES:SOURMASH_SKETCH_GENOME (GCA_022627115_1) [100%] 91 of 91, cached: 90 ✔ [4a/ca36aa] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SUBSET_READS (Rsol_Rsolanacearum) [100%] 6 of 6, cached: 6 ✔ [fa/232d14] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:KHMER_TRIMLOWABUND (BDM_Pbelbahrii) [100%] 6 of 6, cached: 6 ✔ [cb/09ece6] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_READS (PpalZOC03_Ppalmivora) [100%] 6 of 6, cached: 6 ✔ [- ] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_SKETCH_GENOME - [46/21755a] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:SOURMASH_COMPARE (all) [100%] 1 of 1 ✔ [ec/3bf96f] process > PATHOGENSURVEILLANCE:ASSIGN_REFERENCES:ASSIGN_GROUP_REFERENCES (all) [100%] 1 of 1 ✔ [3a/1ed958] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:PICARD_CREATESEQUENCEDICTIONARY (GCA_000365545_1) [100%] 6 of 6, cached: 4 ✔ [72/5f5cc9] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:SAMTOOLS_FAIDX (GCF_001855495_2_genomic.fna) [100%] 6 of 6, cached: 4 ✔ [ce/d7321d] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:REFERENCE_INDEX:BWA_INDEX (GCF_014066325_1) [100%] 6 of 6, cached: 3 ✔ [d8/b2abac] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:CALCULATE_DEPTH (GCF_900187635_1_RKN_Menterolobii) [100%] 6 of 6, cached: 1 ✔ [da/deef4b] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:SUBSET_READS (GCF_900187635_1_RKN_Menterolobii) [100%] 6 of 6, cached: 1 ✔ [f1/2c75f8] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:BWA_MEM (GCF_900187635_1_RKN_Menterolobii) [100%] 3 of 3, cached: 1 [25/0cf853] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_ADDORREPLACEREADGROUPS (GCA_900096695_1_PHW726_fox_matthiolae) [100%] 3 of 3, cached: 1 [82/e4d428] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_SORTSAM_1 (GCA_900096695_1_PHW726_fox_matthiolae) [100%] 3 of 3, cached: 1 [41/1a851c] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_MARKDUPLICATES (GCA_900096695_1_PHW726_fox_matthiolae) [100%] 3 of 3, cached: 1 [e1/d747f7] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:PICARD_SORTSAM_2 (GCA_900096695_1_PHW726_fox_matthiolae) [100%] 3 of 3, cached: 1 [10/785b60] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:ALIGN_READS:SAMTOOLS_INDEX (GCA_900096695_1_PHW726_fox_matthiolae) [100%] 3 of 3, cached: 1 [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:MAKE_REGION_FILE - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:GRAPHTYPER_GENOTYPE - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:GRAPHTYPER_VCFCONCATENATE - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:TABIX_TABIX - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:BGZIP_MAKE_GZIP - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:GATK4_VARIANTFILTRATION - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:CALL_VARIANTS:VCFLIB_VCFFILTER - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:VCF_TO_TAB - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:VCF_TO_SNPALN - [- ] process > PATHOGENSURVEILLANCE:VARIANT_ANALYSIS:IQTREE2_SNP - [49/d4b1e1] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:SUBSET_READS (RKN_Menterolobii) [100%] 2 of 2, cached: 2 ✔ [e2/3f9b29] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:FASTP (Rsol_Rsolanacearum) [100%] 2 of 2, cached: 2 ✔ [58/33c597] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:SPADES (RKN_Menterolobii) [100%] 3 of 3, cached: 1, failed: 2, retries:... [5d/2d4049] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:FILTER_ASSEMBLY (Rsol_Rsolanacearum) [100%] 1 of 1, cached: 1 ✔ [- ] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:QUAST [ 0%] 0 of 1 [d5/62c050] process > PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:BAKTA_BAKTA (Rsol_Rsolanacearum) [100%] 1 of 1, cached: 1 ✔ [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:PIRATE [ 0%] 0 of 1 [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:REFORMAT_PIRATE_RESULTS - [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:CALCULATE_POCP - [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:ALIGN_FEATURE_SEQUENCES - [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:RENAME_CORE_GENE_HEADERS - [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:SUBSET_CORE_GENES - [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:MAFFT_SMALL - [- ] process > PATHOGENSURVEILLANCE:CORE_GENOME_PHYLOGENY:IQTREE2_CORE - [- ] process > PATHOGENSURVEILLANCE:CUSTOM_DUMPSOFTWAREVERSIONS - [- ] process > PATHOGENSURVEILLANCE:MULTIQC - [e9/2dab17] process > PATHOGENSURVEILLANCE:RECORD_MESSAGES (All) [ 50%] 1 of 2, cached: 1 [- ] process > PATHOGENSURVEILLANCE:PREPARE_REPORT_INPUT - [- ] process > PATHOGENSURVEILLANCE:MAIN_REPORT - ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:SPADES (RKN_Menterolobii)'
System information
No response