Hi!
I'm just starting coding nextflow, and couldn't find an answer/example for my specific problem, but apologies if I miss it. I want to write a pipeline for metagenomes binning (and some other features). I have long reads and short reads. The pipeline should assemble the long reads, and use the short reads to polish/improve the long reads assembly. Note that for each long read assembly I have several pairs of short reads (different sequencing platforms or library preps). So, I do several mapping rounds. I want to do this in one workflow because I need to do some merging at the end.
Roughly it is something like:
assembly of long reads --> file with the assembly (assemblylong)
polished assembly using short reads --> several files, one file per each short read method (nanoassemblydef)
map to polished assembly using several short reads pairs independently --> several files, one file per each short read method (mappedassembly)
The 3rd step is giving me the problems because I need the mapping of the short reads to map only for the long read assembly polished with those same reads. Is there a way that I can make the pipeline create a channel/variable that has the structure of a hash where the polished assembly of the long reads shares an ID with the corresponding short reads used for that polish?
the problematic step produces a files like for example:
Hi! I'm just starting coding nextflow, and couldn't find an answer/example for my specific problem, but apologies if I miss it. I want to write a pipeline for metagenomes binning (and some other features). I have long reads and short reads. The pipeline should assemble the long reads, and use the short reads to polish/improve the long reads assembly. Note that for each long read assembly I have several pairs of short reads (different sequencing platforms or library preps). So, I do several mapping rounds. I want to do this in one workflow because I need to do some merging at the end.
Roughly it is something like:
The 3rd step is giving me the problems because I need the mapping of the short reads to map only for the long read assembly polished with those same reads. Is there a way that I can make the pipeline create a channel/variable that has the structure of a hash where the polished assembly of the long reads shares an ID with the corresponding short reads used for that polish?
the problematic step produces a files like for example:
CORRECT: nanopore.gz_test_merged_map.sam_assemblyracon2correction.fasta_catalogue.mmi_test_map.sam
(1) nanopore.gz_(2) test_merged_map.sam (3) _assemblyracon2correction.fasta_catalogue.mmi (4)_test_map.sam
1.long reads file
INCORRECT: nanopore.gz_test_merged_map.sam_assemblyracon2correction.fasta_catalogue.mmi_data_map.sam
(1) nanopore.gz_(2) test_merged_map.sam (3) _assemblyracon2correction.fasta_catalogue.mmi (4)_data_map.sam
examplePipeline.nf
The pipeline is just echoing commands at the moment so it is easy to reproduce. The input file could even be empty text files like:
The input files should be: a long read file = nanopore.gz short reads= test.R1.fq, testR2.fq, data.R1.fq, data.R2.fq
examplePipeline_modules.nf
the processes need to execute the pipeline code (include)
Thanks!! Laura