jnielsendotnet / isboot

FreeBSD kernel module to support booting from iSCSI via IBFT
4 stars 7 forks source link

Some motherboard firmware supplies more than one iBFT table #10

Closed hsnyder closed 1 year ago

hsnyder commented 2 years ago

I have a Supermicro motherboard which supports iSCSI boot via the onboard NICs (which are 1G), and I am trying to get iSCSI boot working via a PCIe NIC (specifically a 10G mellanox card). I am using iPXE to boot and create the iBFT. Unfortunately, what actually happens is that there's a garbage iBFT, related to the onboard NIC iSCSI boot capability, that isboot finds.

But, I managed to figure out that the iBFT created by iPXE is actually present, it's just not the first one that AcpiGetTable finds. I was able to work around the issue by using the following dirty hack:

--- a/src/ibft.c
+++ b/src/ibft.c
@@ -545,12 +545,30 @@ ibft_acpi_lookup(void)
        /*ACPI_IBFT_HEADER *ibft_hdr, *end;*/
        ACPI_STATUS status;

-       status = AcpiGetTable(ACPI_SIG_IBFT, 1, (ACPI_TABLE_HEADER **)&ibft);
-       if (ACPI_FAILURE(status)) {
-               status = AcpiGetTable(IBFT_SIGNATURE, 1, (ACPI_TABLE_HEADER **)&ibft);
-               if (ACPI_FAILURE(status))
-                       return (NULL);
+
+       for (int i = 4; i >= 0; i--) {
+               status = AcpiGetTable(ACPI_SIG_IBFT, i, (ACPI_TABLE_HEADER **)&ibft);
+               if (!ACPI_FAILURE(status)) {
+                       printf("iBFT instance %i (ACPI_SIG_IBFT) found\n", i);
+                       goto found;
+               } else {
+                       printf("iBFT instance %i (ACPI_SIG_IBFT) not found\n", i);
+               }
        }
+
+       for (int i = 4; i >= 0; i--) {
+               status = AcpiGetTable(IBFT_SIGNATURE, i, (ACPI_TABLE_HEADER **)&ibft);
+               if (!ACPI_FAILURE(status)) {
+                       printf("iBFT instance %i (IBFT_SIGNATURE) found\n", i);
+                       goto found;
+               } else {
+                       printf("iBFT instance %i (IBFT_SIGNATURE) not found\n", i);
+               }
+       }
+
+       return NULL;
+
+       found:
        return (uint8_t *)ibft;
 }

But obviously that isn't a great solution, it just will find the last iBFT table present, provided there are 4 or fewer.

On my system, the output is:

Load isboot
iSCSI boot driver version 0.2.15-alpha
iBFT instance 4 (ACPI_SIG_IBFT) not found
iBFT instance 3 (ACPI_SIG_IBFT) not found
iBFT instance 2 (ACPI_SIG_IBFT) not found
iBFT instance 1 (ACPI_SIG_IBFT) not found
iBFT instance 0 (ACPI_SIG_IBFT) not found
iBFT instance 4 (IBFT_SIGNATURE) not found
iBFT instance 3 (IBFT_SIGNATURE) not found
iBFT instance 2 (IBFT_SIGNATURE) found

I'm quite new to FreeBSD kernel hacking - is there a clean way to add a parameter, settable via loader.conf, that would allow the user to specify the Instance argument that gets passed to AcpiGetTable ? It would be nice if we could expose some way for the user to work around janky motherboard firmware like this...

jnielsendotnet commented 2 years ago

I added ACPI support in the first place for the benefit of my own (differently) janky motherboard. I'm not 100% confident that the implementation is fully correct. So I'm open to suggestions if there might be another way to correctly identify which ACPI table to use.

The next thing to look at (and you probably have) would be the BIOS settings to see if there's a way to (more) fully disable iSCSI and/or network booting from the onboard NIC.

I'm curious if iPXE will work for you with the legacy (non-ACPI) "lowmem" search/lookup. Would you mind trying that? Basically comment out lines 563-570 and 586 of the (unmodified) ibft.c and see if it works. If it does then the fix would be to either try that first (permanently switch the order of legacy and ACPI checks), or have a loader tunable to switch the order when requested.

If it doesn't then a different loader tunable to specify which ACPI table to try (or try first) is the cleanest "last resort" option.

One other idea would be to improve the sanity checking on the found table and if it fails keep looking. How/when does the boot fail when it tries to use the first (bad) ACPI table? If it fails the existing ibft_parse_structure() check then we could just make that more robust and go try to get another table if the first one fails. You could verify that by building with IBFT_VERBOSE defined and seeing if it prints "iBFT error" (line 592). If your boot passes that but fails later then I'll need more details to implement an improved sanity check.

hsnyder commented 2 years ago

Thanks for getting back to me. I did try playing with BIOS settings, but I wasn't successful.

I tried the "lowmem" method as per your suggestion and it didn't work ("iBFT not found").

Regarding where precisely the failure occurs with the bad iBFT table, it fails with ENXIO in isboot_init(). So the iBFT table isn't actually corrupt or anything, it's just not usable/correct.

If you're amenable to the loader tunable approach, I'm happy to do the work (though if you have any references on how to read loader tunables from inside the kernel, that would be super helpful). If you prefer the automatic retry approach, I'm happy to implement that too, though depending on how well you know the exact set of steps that would need to be undone given that we've made it all the way through isboot_init, it may be easier for you.

jnielsendotnet commented 1 year ago

Assuming you're still using the same setup a year+ later, will you try 0.2.15-beta2 (or just the latest) and set 'hw.ibft.acpi_table="2"' in your /boot/loader.conf? This commit adds that tunable but I don't have a great way to test it: https://github.com/jnielsendotnet/isboot/commit/a7bf4f96c1e2417e148ddc4837ea188ead342152

jnielsendotnet commented 1 year ago

Should be fixed in 0.2.15.