Closed yangl1996 closed 12 months ago
Doubly thanks yangl1996! I was unaware of mstflint
's abilities. I recall trying various older versions of flint
but none would allow the PSID mismatch. I have added Recovery Mode as an option for programming the Innova-2 ConnectX-5 firmware.
I got the board to enter Recovery Mode by shorting CLK to Vcc by accident. I believe DO to GND is safer.
I am surprised by this as at one point I erased the 25Q128 FLASH on the Innova-2 and just the bridge showed up in lspci
.
@yangl1996 were you able to get anything from mst status
or flint --device /dev/mst/mt525_pciconf0 query
while the board was showing up as MT28800 Family [ConnectX-5 Flash Recovery]
? I was hoping I could recreate your sequence of events.
Thanks, Matthew for updating the docs! It's very interesting that one can force the board into recovery without dedicated tools.
My card arrived with only the Flash Recovery device shown in lspci
. It's curious that there is also a config where only a PCIe switch shows up---maybe even the Flash Recovery firmware is erased?
I used mlxfwmanager
instead of mst status
. Both mlxfwmanager
and mst
are only present in Nvidia's version of MFT, not FreeBSD's mstflint
port. However the mst status
command is not available on the FreeBSD version of MFT. Instead mlxfwmanager
has a similar command that allows one to query all Mellanox devices on the host. The command is able to discover the device and print out its PCI address. (That's how I learned the PCI address format that Mellanox tools expect. Once one learns the address format, it is no longer necessary to install MFT from Nvidia---one can use pciconf -lv
or lspci
to find the address and use mstflint
from the FreeBSD port to flash the card.) It also correctly detected the version of the chip (ConnectX-5).
mstflint -d <pci address of the flash recovery device> query
successfully queries the device. However, most of the fields are N/A
. I cannot recall exactly which fields, but at least Image type, FW Version, FW Release Date, MAC, and GUID. The PSID is also corrupted---maybe N/A
, but definitely not MT_0000000158
.
I made multiple failed attempts to flash the card, until I found the article mentioned in my original post. The trick seems to be getting through (by disabling) the multiple safety checks that mstflint
has for firmware flashing. One needs to use some exact combination of options, and the error messages are not very helpful. (I tried to decipher the errors by going through the codebase but it's too much work.) At some point I also tried to boot Ubuntu and use the Linux versions of MFT tools, but I encountered the issue with device ID described in the original post. The issue is present in flint
(from Nvidia) and mstflint
(installed from a Ubuntu package). The exact same command worked after I switched back to FreeBSD.
I tried erasing (0xFF) the 25Q128 FLASH IC as well as writing all 0x00 to it and in both cases it boots into Flash Recovery Mode. I have no idea how I previously got it to boot with just the one PCIe bridge device visible.
I discovered it is possible to corrupt the firmware using mstflint
and then program the ConnectX-5 without any problems.
Thanks for such detailed documentation! I just started playing around a bit with the card. Mine arrived with no firmware and lspci shows only one device
MT28800 Family [ConnectX-5 Flash Recovery]
. The instructions in the notes do not work for me. Here's what I ended up doing:flint
complains that it cannot find device ID. Source code of mstflint shows that the tool does not try to obtain device ID on FreeBSD, thus not producing the error.)mstflint
port. (Nvidia does provide MFT tools on its website, but I found the open source version provided by FreeBSD ports is sufficient. If one does want to install Nvidia's version, one needs toln -s /usr/local/bin/bash /bin/bash
since Nvidia's version assumes thatbash
is present at/bin/bash
which is not the case on FreeBSD.)mstflint -nofs --use_image_ps --ignore_dev_data -d pci0:3:0:0 -i /root/Innova_2_Flex_Open_18_12/FW/Morse_FW/fw-ConnectX5-rel-16_24_4020-MNV303212A-ADL_Ax.bin burn
. Herepci0:3:0:0
is the PCI address of Flash Recovery device.The command is mostly a copy paste from https://github.com/Tualua/labnotes/wiki/Mellanox-ConnectX-4-Lx-Firmware-recovery but I thought it might be useful to put it here to benefit folks like me who have not tried to recovery a Mellanox NIC.
LMK if you would prefer that I submit a PR.