Closed abhullar-tt closed 1 month ago
@TT-billteng do you know if there are any BHs marked for Cloud/Metal CI that will match the existing CI machine specs. If not, do you know how many we may need if we eventually want BH to be tested as part of post commit?
@TT-billteng do you know if there are any BHs marked for Cloud/Metal CI that will match the existing CI machine specs. If not, do you know how many we may need if we eventually want BH to be tested as part of post commit?
So the AMD 7950x3d we just ordered should be much faster than any individual cloud VM instance we currently have. If this is still too slow, we need to investigate host perf on BH. Cloud is in the process of densifying the machines with more cards (upgrading each machine to 8 cards from 4). I'm in the process of qualifying perf on 8vCPU VMs (we've been running on 14vCPUs for CI). If BH actually needs far more resources on host side for whatever reason, this will upend cloud's roadmap and we need to let them know ASAP.
As for putting blackhole CI into regular post-commit, it'll depend on how many tests we activate vs. how many BH runners we have. As a point of reference, we currently have 30-35 CI runners of each card type (E150/N150/N300).
We have faster BH machines but it seems like post commit hasn't been running faster due to https://github.com/tenstorrent/tt-metal/issues/11717
@abhullar-tt can we close this?
yes
Two parts:
FYI @davorchap @pgkeller