microbiomedata / nmdc-edge

Web-based interface to the NMDC EDGE platform
https://nmdc-edge.org
3 stars 0 forks source link

Metagenome Assembly spades failed #49

Open mflynn-lanl opened 7 months ago

mflynn-lanl commented 7 months ago

Workflow Name Metagenome Assembly

Project URL https://nmdc-edge.org/admin/project?code=brFrthQj2C3PyUvj

Additional Info

 '  0:00:07.478    34M / 579M  INFO    General                 (path_extenders.cpp        :  36)   Processed 0 paths from 2214 (0%)',
 "spades-core: /spades/assembler/src/common/modules/path_extend/paired_library.hpp:124: double path_extend::PairedInfoLibraryWithIndex<Index>::CountPairedInfo(path_extend::EdgeId, path_extend::EdgeId, int, bool) const [with Index = const omnigraph::de::PairedIndex<debruijn_graph::DeBruijnGraph, omnigraph::de::PointTraits, omnigraph::de::safe_btree_map>&; path_extend::EdgeId = omnigraph::impl::EdgeId]: Assertion `index_.size() != 0' failed.",
 '',
 '',
 '== Error ==  system call for: "[\'/SPAdes-3.15.1-Linux/bin/spades-core\', \'/expanse/projects/nmdc/cromwell/cromwell-executions/main_workflow/a96a79a3-5d6a-4410-88d7-ed85f49efb47/call-jgi_metaASM/MetaAssembly.jgi_metaASM/2905ba69-93e2-413b-a993-9f22b4f823d4/call-assy/execution/spades3/K127/configs/config.info\', \'/expanse/projects/nmdc/cromwell/cromwell-executions/main_workflow/a96a79a3-5d6a-4410-88d7-ed85f49efb47/call-jgi_metaASM/MetaAssembly.jgi_metaASM/2905ba69-93e2-413b-a993-9f22b4f823d4/call-assy/execution/spades3/K127/configs/mda_mode.info\', \'/expanse/projects/nmdc/cromwell/cromwell-executions/main_workflow/a96a79a3-5d6a-4410-88d7-ed85f49efb47/call-jgi_metaASM/MetaAssembly.jgi_metaASM/2905ba69-93e2-413b-a993-9f22b4f823d4/call-assy/execution/spades3/K127/configs/meta_mode.info\']" finished abnormally, OS return value: -6',
 '',
 '======= SPAdes pipeline finished abnormally and WITH WARNINGS!',
 '',
 '=== Error correction and assembling warnings:',
 ' * 0:00:06.870    25M / 579M  WARN    General                 (pair_info_count.cpp       : 377)   Estimated mean insert size 165 is very small compared to read length 251',
chienchi commented 7 months ago

The input file is single end (R1) only and spades expected paired-end data. see the similar issue. This is data input error but not sure how best given the clear error message.

{
    "main_workflow.MetaAssembly_input_file": ["/expanse/projects/nmdc/edge_app/nmdc-edge/io/projects/brFrthQj2C3PyUvj/input/010342_A24-8549_S40_L001_R1_001.fastq.gz"],
    "main_workflow.MetaAssembly_rename_contig_prefix": "8549",
    "main_workflow.MetaAssembly_outdir": "/expanse/projects/nmdc/edge_app/nmdc-edge/io/projects/brFrthQj2C3PyUvj/output/MetagenomeAssembly",
    "main_workflow.MetaAssembly_input_fq1":[],
    "main_workflow.MetaAssembly_input_fq2":[],
    "main_workflow.MetaAssembly_input_interleaved":true

}
mflynn-lanl commented 7 months ago

Can we add a check to see if the file is really interleaved and if not fail the workflow right away? It would be really great if we could do it in-browser...

chienchi commented 7 months ago

If the fastq doesn't follow the naming convention (ex some files from SRA), it is not easy to check. The assembler has reads assembled and mapped reads back to the assembled contigs, then it found out only a few paired-reads and stopped. So, ideally, we can assembled a portion of the input and try mapped reads back to check how many paired reads within the inserted size range but this strategy may not efficient enough.

mflynn-lanl commented 7 months ago

We could check the filename in-browser and warn the user. That would at least weed out cases where the filename follows the naming convention. And maybe a mouseover with a message reminding the user to only use an interleaved file?

yxu-lanl commented 7 months ago

I think we need a module (python?) to validate inputs and report useful error message. This module will be called inside the WDL wrappers.

chienchi commented 7 months ago

That is good idea but the runtime check may pose challenges, potentially leading to job failure/stop. In such cases, users need to submit another job or rerun it.