theiagen / public_health_bacterial_genomics

GNU Affero General Public License v3.0
27 stars 14 forks source link

rare SeqSero2 failure #171

Closed kapsakcj closed 1 year ago

kapsakcj commented 1 year ago

Only seen this with 2 samples so far, but just wanted to document in case we see this again.

For these 2 samples, SeqSero2 only fails when using the "cleaned" FASTQ files as input to SeqSero2 when it is set in the "allele mode"

I ran both samples through SeqSero2 in a variety of configurations, all which correctly predicted Muenster and Kiambu: Cleaned FASTQs - kmer mode ✅ Raw FASTQs - kmer mode ✅ Assembly - kmer mode ✅ Cleaned FASTQs - allele mode ❌

I looked at the SPAdes logs and something went awry during the "micro-assembly" step of the allele mode of SeqSero2 which takes the reads that align to the serotype specific genes as input to the assembly process.

end of log:

  0:00:02.259    16M / 5G    INFO   PathExtender             (path_extender.hpp         : 952)   Processed 0 paths from 118 (0%)
=== Stack Trace ===
[0x40b2aa]
[0x50dd12]
[0x4d17ab]
[0x4d19aa]
[0x4c7467]
[0x4cdcd7]
[0x4d6e4a]
[0x4b8626]
[0x4ea02f]
[0x4eddfa]
[0x4ee893]
[0x4d5ded]
[0x4d6305]
[0x516059]
[0x4b5d9e]
[0x4b6dc7]
[0x58cb21]
[0x409957]
[0x402e05]
[0x76f920]
[0x40848d]
spades: /spades/src/modules/algorithms/path_extend/paired_library.hpp:138: double path_extend::PairedInfoLibraryWithIndex<Index>::CountPairedInfo(path_extend::EdgeId, path_extend::EdgeId, int, bool) const [with Index = const omnigraph::de::PairedIndex<debruijn_graph::DeBruijnGraph, omnigraph::de::PointTraits, omnigraph::de::safe_btree_map>&; path_extend::EdgeId = restricted::pure_pointer<omnigraph::PairedEdge<debruijn_graph::DeBruijnDataMaster> >]: Assertion `index_.size() != 0' failed.

== Error ==  system call for: "['/SPAdes-3.9.0-Linux/bin/spades', '/data/sample-seqsero2-clean-reads-allele-mode/2022_11_17_15_10_30_temp/K127/configs/config.info', '/data/sample-seqsero2-clean-reads-allele-mode/2022_11_17_15_10_30_temp/K127/configs/careful_mode.info']" finished abnormally, err code: -6

Since the micro assembly failed, SeqSero2 was unable to predict a serotype:

$ cat SeqSero_result.txt
Sample name:    sample
Output directory:       /data/sample-seqsero2-clean-reads-allele-mode
Input files:    /data/sample_1.clean.fastq.gz      /data/sample_2.clean.fastq.gz
O antigen prediction:   -
H1 antigen prediction(fliC):    -
H2 antigen prediction(fljB):    -
Predicted identification:       Salmonella enterica subspecies enterica (subspecies I)
Predicted antigenic profile:    -:-:-
Predicted serotype:     I -:-:-
Note:   No serotype antigens were detected. This is an atypical result that should be further investigated.
kapsakcj commented 1 year ago

I'm thinking this is specific to the staphb/seqsero2:1.2.1 docker image which uses spades 3.9.0.

When I ran the same cleaned FASTQ files through a bioconda install of seqsero2 v1.2.1 (uses spades 3.15.5), it ran successfully and predicted the serotype, no spades failure.

I'll open an issue over on docker builds

kapsakcj commented 1 year ago

Perhaps we don't need to use the raw FASTQs as input, I think this will be resolved if we upgrade the docker image

kapsakcj commented 1 year ago

Resolved via upgrading the SPAdes version in the StaPH-B docker image staphb/seqsero2:1.2.1 (PR linked above)