hammerlab / biokepi

Bioinformatics Ketrew Pipelines
Apache License 2.0
27 stars 4 forks source link

Better estimate each job's resource requirements and encode them into Biokepi as defaults #476

Open armish opened 7 years ago

armish commented 7 years ago

All right — continuing my scattered idea storms: one of the things I realized running all those pipelines over and over again (to workaround random middle-node failures) was that we are not doing a good job of utilizing the clusters to their full potential. Two empirical observations:

I am not sure what the best solution would be here but I was fancying the idea of adopting a merge-sort-based randomization approach to evenly spread potentially similar tasks in time or in the queue so that we reduce their chances of causing issues to each other.

Relevant (old) read on such algorithm that we might benefit: https://labs.spotify.com/2014/02/28/how-to-shuffle-songs/


PS: For the resource estimation part, after trying and failing to deploy a Grafana+InfluxDB monitoring system into GKE's container engine (they do make it hard to do such a thing), I have been collecting some statistics from the GKE's own StackDriver and will try to embed some estimates in to see whether they will make a difference.

Re: the rest... just some daydreaming for now ;)

armish commented 7 years ago

Relevant discussions on this: https://github.com/hammerlab/biokepi/issues/193 and https://github.com/hammerlab/biokepi/issues/166