Closed tt-rkim closed 3 months ago
So if i'm understanding correctly, you are looking for both ways to perform a reset - with a json config file, and without a config file, by directly providing the pcie indices of the boards. Please correct me if I misunderstood the question. As of release 2.0.0 you can do both! Link to instructions - https://github.com/tenstorrent/tt-smi?tab=readme-ov-file#resets Do let me know if this answers your question
Hey, in this issue, I'm specifically asking for a 3rd option: without a JSON config file and without providing the PCIe indices of the boards, by defaulting to all PCIe devices found on host.
Hmm I see. I am not fully convinced of this 3rd usecase.
What I can do is that the default -r
option can look for a json reset config file in the parent folder without needing one provided and use that to perform the reset.
I want users to be as verbose with the reset as possible - since we are using it to support varied usecases and there shouldn't be ambiguity with what is expected from smi
That's fair - then I'll go ahead and close this issue soon. It was more so to make our automation a little more convenient, but I think as part of provisioning we can:
Let me know if that's a flow that doesn't really make sense.
@TT-billteng @vtangTT
Here is the old issue about having an all reset
I feel like reseting all the available TT boards in a system should be a valid option. We've had to maintain separate scripts depending on machine configuration just to ensure all boards are in a good state.
You have to specify a JSON config or MMIO IDs, even when resetting something with an attached Galaxy.
For example, for two Nebulas by PCIe,
as opposed to something like
From the impression I got from internal convos with SysEng people, this should be doable and not that complicated for Nebula-only setups on a single host. I hope I'm not wrong