Open pgkeller opened 7 months ago
For performance changes:
At some point I overheard BH has interrupts support, at least on BRISC, if I remember correctly. Should confirm its functionality.
"64B-alignment for reads" Just wanted to make sure not a typo, since in GS/WH it's 32B-alignment.
Also, I recall there being some issues with there being a relatively small NOC max packet size of 8KB. Wondering if BH will have the same limitation.
"64B-alignment for reads" Just wanted to make sure not a typo, since in GS/WH it's 32B-alignment.
not a typo, needs to change
Some risks I see from infra / CI side:
perhaps we need to fork TTMetal on gitlab side and make necessary modifications
Versim cannot run on cloud network / metal CI unfortunately... we need some serious rethinking around CI / dev for this
my 2c: versim as a stop-gap to prefetch some the software development to build/bring up our sw stack and some unit test until hardware comes back. Once the hardware comes back, testing / development will be brought up on hardware and versim should be less relevant. As such, CI on versim may be throw away work.
per @rtawfik01 earlier on the current coverage on versim for Buda -
Unary datacopy -> start with single tile, single core, bfloat16, then test more combinations i.e more tiles, different dataformats, etc Unary SFPU ops Eltwise binary ops Reduce ops Matrix multiply/Convs Here we can start more testing combination ops/graphs -> layernorms, softmaxes, feedforwards, etc
With buda, we tested all kernels, but we started with the list above And always try every op with the simplest scenario (bfloat 16, single core, single tile), then add combinations as the tests pass
Also, Reem is getting the llk submodule ready and we should let her know when we complete bulding metal with the current blackhole arch compile through metal stack.
TODOs as of 31/07/2024
FYI @davorchap
This is a list of todos for BH. Must to run anything:
Phase1 for BH bring up - target 5/2
@abhullar-tt
Reem
Almeet/David
OLD NOTES BELOW:
Infra Flow - Versim scramble
Development flow (MVP: metal running slow dispatch with a few ops) -
note: if we can staff (2) - (5) pretty soon, you should try to test on versim; if not, we should do it on the cards.
note: this flow will give us versim as a backup platform in case things don't work on the cards - but development and testing on both side (github/gitlab) is very cumbersome.
TODO:
Phase2 for BH 30-day milestone & Open Source BH SW - target 5/17 Metal Goal - Single Tensix OP [ ] Versim on CI [ ] MatMul workload to stress-test single Tensix core
Phase3 for BH 60-day milestone - target 6/17 Metal goal - Multi-Tensix OP [ ] MatMul workload to stress-test Multi Tensix cores [ ]
To be prioritized --> [ ] ? NOC/tensix shared access (need to enumerate) [ ] Eth IRAM? TBD if BH has IRAM on ethernet. If true - need changes for Eth support
Performance changes/new features (not required to run): [ ] NOC has a RISC-NOC command fifo which allows more non-blocking transactions in flight (legacy interface still works)
SFPU/I Optimizations / New Features:
Debug/analysis: