Open mo-tenstorrent opened 3 months ago
Closing https://github.com/tenstorrent-metal/tt-metal/issues/7021 in favour of this one
@jliangTT Did we decide on the how we are gonna breakdown this CI so that each job now gets an owner?
CNN, LLM, Other might be too generalized. I can also do at the root level. The following, every line will be a job. i.e.:
This will make the CI take longer time and it can get crowded real quick
Is it too generalized? I thought that's what we decided on the landing page. Have people complained this is too large buckets, or do you think so?
My only worry is that if we have fails at that level owners can't immediately tell if the are in charge of investigating the fail.
Unless @jliangTT or @TT-billteng have different understandings of the process from me, I believe that's the point of "pipeline ownership". Their team owns the pipeline full stop, and are responsible for finding out what's wrong. They're always welcome to ask others to fix their pipelines, including infra team.
They could be the cause of the failure, they could not be. They could have the skills / knowledge to find out root cause, they could not. They could have the skills to fix the root cause, they could not. Regardless, their names are the ones liable for ensuring that it's green again.
I also believe @uaydonat has this understanding of pipeline ownership.
I am trying to use this chart to reason about ownership - https://docs.google.com/spreadsheets/d/1px7wdl29yeCEQQ1rQGFCQR69lk6BdRQdeEMDG0t-w3M/edit#gid=1506109495
it is updated with the latest view. Let me know if there is anything confusing
are we tracking LLMs on GS?
what is currently running pertaining to falcon/mistral/llama in GS? is this kind of the nop?
As of main yesterday, no on-device perf models run on WH. Only GS.
In tests/scripts/run_performance.sh
, we see that the following models are run on device perf:
run_device_perf_models() {
...
# explicitly skips wh b0 so probably broken on it
env pytest "tests/ttnn/integration_tests/resnet/test_performance.py" -m $test_marker
env pytest models/demos/resnet/tests -m $test_marker
env pytest models/demos/metal_BERT_large_11/tests -m $test_marker
env pytest models/demos/ttnn_falcon7b/tests -m $test_marker
# not sure what exactly is diff b/w this and metal_BERT_large_11, maybe some ops / sizes
env pytest models/demos/bert/tests -m $test_marker
# this doesn't even run on GS, so it's a no-op. We should reach out to test authors
env pytest models/demos/mistral7b/tests -m $test_marker
...
}
So:
@mo-tenstorrent also made some small quality of life changes to upload the CSV perf results even if there's a perf checdk failure. I see the following models run on latest main:
you can download artifact from workflow run directly.
Falcon7b ttnn is owned by @cfjchu
Hey guys, We need on-device perf models run on WH. We only optimize LLMs on WH. We would add falcon7b, mistral, mamba to the WH device-perf pipeline. It would be great if it was like llm_javelin_wormhole_b0.
Following the conversation on slack regarding the device Perf CI responsibility, to be able to better distribute CI monitoring between various model owners, CI has to be split into multiple jobs.
Initial suggestion is to do a job per model type.
This is the CI https://github.com/tenstorrent-metal/tt-metal/actions/workflows/perf-device-models.yaml