Open mflynn-lanl opened 7 months ago
The input file is single end (R1) only and spades expected paired-end data. see the similar issue. This is data input error but not sure how best given the clear error message.
{
"main_workflow.MetaAssembly_input_file": ["/expanse/projects/nmdc/edge_app/nmdc-edge/io/projects/brFrthQj2C3PyUvj/input/010342_A24-8549_S40_L001_R1_001.fastq.gz"],
"main_workflow.MetaAssembly_rename_contig_prefix": "8549",
"main_workflow.MetaAssembly_outdir": "/expanse/projects/nmdc/edge_app/nmdc-edge/io/projects/brFrthQj2C3PyUvj/output/MetagenomeAssembly",
"main_workflow.MetaAssembly_input_fq1":[],
"main_workflow.MetaAssembly_input_fq2":[],
"main_workflow.MetaAssembly_input_interleaved":true
}
Can we add a check to see if the file is really interleaved and if not fail the workflow right away? It would be really great if we could do it in-browser...
If the fastq doesn't follow the naming convention (ex some files from SRA), it is not easy to check. The assembler has reads assembled and mapped reads back to the assembled contigs, then it found out only a few paired-reads and stopped. So, ideally, we can assembled a portion of the input and try mapped reads back to check how many paired reads within the inserted size range but this strategy may not efficient enough.
We could check the filename in-browser and warn the user. That would at least weed out cases where the filename follows the naming convention. And maybe a mouseover with a message reminding the user to only use an interleaved file?
I think we need a module (python?) to validate inputs and report useful error message. This module will be called inside the WDL wrappers.
That is good idea but the runtime check may pose challenges, potentially leading to job failure/stop. In such cases, users need to submit another job or rerun it.
Workflow Name Metagenome Assembly
Project URL https://nmdc-edge.org/admin/project?code=brFrthQj2C3PyUvj
Additional Info