grailbio / reflow

A language and runtime for distributed, incremental data processing in the cloud
Apache License 2.0
965 stars 52 forks source link

Why is Reflow so slow compared to AWS batch? #106

Closed olgabot closed 5 years ago

olgabot commented 5 years ago

Hello, We (@shayanhoss and I) were rerunning 50,000 single cells through an RNA-seq pipeline and found that it took several days for Reflow to run even a few hundred cells, while in that same time, the same pipeline on AWS batch ran almost all samples. Why is this? We had ~20 instances running simultaneously but they were not being used to full capacity. How can we see how many jobs are being run per instance, and what we can do to better maximize resources? Warmest, Olga

mariusae commented 5 years ago

Hard to say without more context. In principle it should not run any slower.