hail-is / hail

Cloud-native genomic dataframes and batch computing
https://hail.is
MIT License
977 stars 244 forks source link

[batch] In Azure, we expect a quarter-core 50MiB job to land on an F*s machine but it lands on a D*ds machine. #12958

Open danking opened 1 year ago

danking commented 1 year ago

What happened?

In particular the, test_pool_highcpu_instance_cheapest test is failing. Azure indeed published new prices today. Ds got cheaper:

| az/vm/Standard_D8ds_v4/spot/eastus/1680307200000    |  0.00000000004422498914930556 |       52602 |               52602 |
| az/vm/Standard_D8ds_v4/spot/eastus/1682899200000    |  0.00000000003538601345486111 |       52640 |               52640 |

Fs got more expensive:

| az/vm/Standard_F8s_v2/spot/eastus/1680307200000    | 0.000000000016796061197916665 |       52629 |               52629 |
| az/vm/Standard_F8s_v2/spot/eastus/1682899200000    |  0.00000000001763156467013889 |       52667 |               52667 |

It's not clear why batch is preferring a D to an F since Fs are still cheaper per core than Ds. Perhaps something about the disk allocation?

Version

1940547d35ddddb084ad52684e36153c1e03a331

Relevant log output

No response

danking commented 1 year ago

The xfail is wrong because these tests pass in default where we use 16 core machines. Something about disk and 8 core machines makes standards cheaper than highcpu. We skipped them in: https://github.com/hail-is/hail/pull/12979