Open eric-czech opened 3 years ago
On instance pricing:
If various dask memory issues could be solved and we could use preemptible standard instances, the total cost would be around $53k.
@eric-czech I assume this stat:
Running phenotypes on chr21 (141,910 variants) for 11 hr 5 mins produced results for 265 phenotypes when using a cluster of 60 n1-highmem-16 instances
and this comment https://github.com/pystatgen/sgkit/issues/390#issuecomment-748205731 are for the same run, correct? And if it is, how is the 11 hr 5 mins here, connected with about 2hrs it took to run the regressions for chr21?
and this comment pystatgen/sgkit#390 (comment) are for the same run, correct?
No, that caption is definitely misleading -- I was either wrong when I wrote it or trying to make it clear that the individual phenotypes can be seen as single spikes. Here is a full version of that readout that also includes the run of the 265 phenotypes:
This is an estimate of the VM rental time necessary to do the GWAS regressions (similar to https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/issues/8).
Here are current figures:
A ballpark cost to keep a cluster of this size running that long is 60 nodes (231 days 24 hrs) * $0.946424/hr = $314,818.
Clearly we have got to find some room to improve this.