Simplify the pipeline - Githubissues

laughedelic commented 8 years ago

Everybody knows that launching loquats is not so fast and managing the pipeline manually is not convenient at all. So I suggest some simplification to the pipeline.

blast results merging is a bottleneck of the pipeline, because it cannot me parallelised more: one task/worker per sample
counting loquat is very light-weight and there is no point in launching it in a separate step
assignment loquat already needs a lot of memory

So I suggest to merge these three steps in one loquat:

merge blast results
do assignment
do counting

Again, this works as task per sample.

Same reasons to merge flash and split step. But this requires having split as a separate step for pipelines that don't need paired-end reads merging.

marina-manrique commented 8 years ago

Fine for me about doing in the same task: merge, count, assign

Same reasons to merge flash and split step. But this requires having split as a separate step for pipelines that don't need paired-end reads merging.

and this I would also prefer to have it in independent steps so we can work with single reads/scaffolds...

laughedelic commented 8 years ago

and this I would also prefer to have it in independent steps so we can work with single reads/scaffolds...

what I mean is having different pipelines for these cases:

For paired-end reads:

flash + split-on-chunks
blast
merge-chunks + assign + count

And for non paired-end reads/scaffolds:

split-on-chunks
blast
merge-chunks + assign + count

(the difference is only in the first loquat/step).

@marina-manrique @eparejatobes I'd like to get feedback from you, because if you support this suggestion, I want to include it in the next release:

it's easy to implement
it makes usage of MG7 much simpler (3 steps instead of 5/6)
it fastens the process, because
- split step is very stupid and fast, no point in waiting for a separate loquat launch to do it
- same about merge
- same about counting, which is super fast (like 10 seconds in the meta-gacu with scaffolds) and it shares the same environment with the assignment step

marina-manrique commented 8 years ago

Perfect for me

eparejatobes commented 8 years ago

OK, :+1:

ohnosequences / mg7

Simplify the pipeline #32