tenstorrent / tt-mlir

Tenstorrent MLIR compiler
https://tenstorrent.github.io/tt-mlir/
Apache License 2.0
52 stars 7 forks source link

Balance our infra to give Forge Frontend more machines #520

Closed staylorTT closed 1 week ago

staylorTT commented 2 weeks ago

Right now Forge Fe only has 1 machine allocated to it. This is causing a bottle neck on their development.

@nvukobratTT How many additional machines will ease the constrain?

@tapspatel How many machines can we afford to allocate to the FE side of things?

I know we looked into pooling which seemed non-trivial, however I wonder if there is a way to make this easier in the future / understand how our infra utilization is going.

nvukobratTT commented 2 weeks ago

@nsmithtt @tapspatel During this sprint, we plan to focus on E2E tests mostly for higher robustness. Therefore, we'll require more silicon CI machines than one.

What are the plans for MLIR testing? Do you have some estimates of the required workflow?

From the FFE perspective, it'll be good to have:

By the above points, it'll be good to balance things out a bit. Do you think that is possible?

vmilosevic commented 1 week ago

After discussing this, we go with: 51/49 split in favor of forge-fe for the cases where we have an odd number of machines

tt-mlir

builder-1 builder-2 n150-1 n150-2 n150-3 n300-1 n300-2

tt-forge-fe

builder-3 builder-4 builder-5 n150-4 n150-5 n150-6 n300-3 n300-4 n300-5

nvukobratTT commented 1 week ago

Thanks! Will those appear here once added?

vmilosevic commented 1 week ago

Yes. There are new VMs here as well so I need to setup docker, drivers, huge pages, etc first. Once this is done they will show as active runners.

nvukobratTT commented 1 week ago

Yes. There are new VMs here as well so I need to setup docker, drivers, huge pages, etc first. Once this is done they will show as active runners.

Sounds good! Thanks for pushing this further @vmilosevic 💪