Closed TT-billteng closed 4 months ago
Hey Bill, thanks for your issue. I think I might know what is causing this issue. Looks like your board is setup to be part of a bigger mesh, and it cannot find the remainder of the chips and so it fails. You can tell because the coords are expected to be (0,0,0,0) and (0,1,0,0) for a standalone nb300 - but yours is (0,0,0,0) and (1,0,0,0) If possible I would recommend flashing your board again with tt-flash and trying your experiment again.
Thanks @sbansalTT , what do the 4 coordinates represent and how do you think it could've reached this state? I'll try flashing again
The coordinates represent the position of the board in the mesh - (x, y, rack, shelf). You can use the tool tt-topology to program these coordinates depending on what multichip setup you would like to use. Most likely the board you were using was part of a multichip system previously and was not flashed back to its original state before being re-purposed as your dev board
Ok, I upgraded FW, the coordinates aren't fixed, but I can reset reliably now 🤔. I guess the coordinates aren't important to the reset process anymore?
I'm working on a N300 VM instance and running some repeatability tests. My script starts off with resetting the board, but this fails occasionally, which stops my tests.
Just running
tt-smi
in a loop doesn't work reliably:This fails roughly half the time on my VM.