malariagen / datalab

Repo for files and issues related to cloud deployment of JupyterHub.
MIT License
0 stars 1 forks source link

Estimating costings for work done #60

Closed hardingnj closed 5 years ago

hardingnj commented 5 years ago

Hi @slejdops

I recently made this PR: https://github.com/malariagen/vector-tools/pull/18

Taking 40 workers, it takes about 45 minutes to run the 64k or so whole genome pairwise comparisons for 360 samples.

Alistair and I were wondering how much this costs in terms of compute- and is it feasible to do for all sample sets.

Thanks!

PS possibly linked to #28

hardingnj commented 5 years ago

I guess this will drop substantially now we have pre-emptibles working? It would be useful to put a number on this.

slejdops commented 5 years ago

All dask workers are scheduled on preemptibles.

The preemtible pools consist of n1-standard-4 virts. Preemptible price of an n1-standard-4 in us-central1 is $0.0400 per hour. n1-standard-4 has 4 vCPUs and 15 GB of memory our dask worker has 1.75 vCPUs and 6 GB of memory 40 workers would require 20 n1-standard-4 VMs. Running that for 45 minutes costs $0.6 provided that the dask cluster scales down as soon as the calculations are finished. You could use cluster.adapt()to force the termination of your workers. Preemtible nodes usually take about 15 minutes to scale down and about 10 minutes to scale up when you instantiate a dask cluster. So we can estimate the total cost of running your calculation at about USD 1.

hardingnj commented 5 years ago

Thanks