each step is independently scalable: you can configure type and number of workers you want to use
split, merge and counting loquat "sizes" should be proportional to the number of input samples
BLAST and assignment loquat "sizes" should be proportional to the sizes of the samples (i.e. number of chunks after split)
this allows to analyse data as fast as you are ready to pay want: the most resource-demanding part is heavily parallelized
@ohnosequences/docs What do you think?
Other things that could be added to this diagram:
example instance types (some steps require average hardware, others — more powerful)
user interaction (launching each step, uploading input data, getting results). I didn't add it because it clutters the diagram quite a lot and it's more about data flow
Flash step for illumina reads. It just doesn't fit 😅 (this diagram service has a limitation on the size of the grid for a free account). Also it's not clear how to show that it's optional. I would just explain it in text, because it's not a part of the core MG7 pipeline, more like a data preparation step.
Using https://aws.amazon.com/architecture/icons/.