tenstorrent / tt-smi

Tenstorrent console based hardware information program
Apache License 2.0
18 stars 3 forks source link

tt-smi mobo reset should not boot credos unless explicitly required #11

Closed hmohiuddinTT closed 2 months ago

hmohiuddinTT commented 4 months ago

If the user passes the default reset_config which has all credo ports listed, then it should detect if the credos are already booted and not boot them again, unless the user explicitly wants them to be rebooted.

Here's a sample reset_config.json:

{
    "time": "2024-03-14T19:18:05.306323",
    "host_name": "ubuntu",
    "gs_tensix_reset": {
        "pci_index": []
    },
    "wh_link_reset": {
        "pci_index": [
            0
        ]
    },
    "re_init_devices": true,
    "wh_mobo_reset": [
        {
            "nb_host_pci_idx": [
                0
            ],
            "mobo": "MOBO_IP",
            "credo": [
                "6:0",
                "6:1",
                "7:0",
                "7:1"
            ],
            "disabled_ports": []
        }
    ]
}

I'd want to keep this same reset_config.json instead of creating another one with the credo ports removed for just resetting the modules.

bingliTT commented 2 months ago

In tt-tools-common galaxy_reset's credo_boot function, there's a part where it will skip booting Credos if you simply do not include the "credo" option in your json

    def credo_boot(self, mobo_dict):
        # Function for booting credos concurrently
        mobo = mobo_dict["mobo"]
        if "credo" in mobo_dict.keys():
            credo_ports = mobo_dict["credo"]
        else:
            print(
                CMD_LINE_COLOR.BLUE,
                f"{mobo} - No credos to be booted, moving on ...",
                CMD_LINE_COLOR.ENDC,
            )
            return
        ...

I.E. your reset json should look like the following to achieve the desired effect

{
    "time": "2024-03-14T19:18:05.306323",
    "host_name": "ubuntu",
    "gs_tensix_reset": {
        "pci_index": []
    },
    "wh_link_reset": {
        "pci_index": [
            0
        ]
    },
    "re_init_devices": true,
    "wh_mobo_reset": [
        {
            "nb_host_pci_idx": [
                0
            ],
            "mobo": "MOBO_IP"
        }
    ]
}
hmohiuddinTT commented 2 months ago

Thanks for the solution @bingliTT, I verified it works for me!