open-power / hostboot

System initialization firmware for Power systems
Apache License 2.0
75 stars 97 forks source link

Missing secureboot header on PAYLOAD error message is incomprehensible #126

Open ghost opened 6 years ago

ghost commented 6 years ago
 43.02844|ISTEP 20. 2 - host_load_hdat
 43.04898|System shutting down with error status 0x900000A7
 43.04726|================================================
 43.04727|Error reported by initservice (0x0500) PLID 0x900000A7
 43.04728|  Initialization Service launched a function and the task returned an error.
 43.04728|  ModuleId   0x01 BASE_INITSVC_MOD_ID
 43.04729|  ReasonCode 0x0506 WAIT_FN_FAILED
 43.04729|  UserData1  task id or task return code : 0x0000000000000269
 43.04730|  UserData2  returned status from task : 0x0000000000000001
 43.04731|------------------------------------------------
 43.04731|  Callout type             : Procedure Callout
 43.04732|  Procedure                : EPUB_PRC_HB_CODE
 43.04733|  Priority                 : SRCI_PRIORITY_HIGH
 43.04733|------------------------------------------------
 43.04734|  host_load_hdat
 43.04734|------------------------------------------------
 43.04735|  Hostboot Build ID: 
 43.04735|================================================

This seems to be the result of a PAYLOAD partition that doesn't have a secureboot header on it (but is XZ compressed and otherwise valid).

ghost commented 6 years ago

This will work: ./op-test --config-file ltc-boston114.conf --flash-skiboot ~/skiboot/skiboot.lid.xz.stb

This will cause the above error: ./op-test --config-file ltc-boston114.conf --flash-skiboot ~/skiboot/skiboot.lid.xz

IlyaSmirnov91 commented 6 years ago

Have been trying to recreate this with op-build level cf83faa745f652b7f9cb7e9e5e606ba2bcf7f2c7. Replacing PAYLOAD with the skiboot.lid.xz generated by the above build produces the following errors:

46.01959|System shutting down with error status 0x90000011
46.04822|================================================
46.05134|Error reported by pnor (0x0600) PLID 0x90000011
46.04536|  Error parsing secure header
46.04537|  ModuleId   0x0a SECUREBOOT::MOD_SECURE_CONT_HDR_PARSE
46.04538|  ReasonCode 0x0631 PNOR::RC_BAD_SECURE_MAGIC_NUM
46.04538|  UserData1  Actual magic number : 0x00000000fd377a58
46.04539|  UserData2  Expected magic number : 0x0000000017082011
46.04540|------------------------------------------------
46.04853|  Callout type             : Procedure Callout
46.04853|  Procedure                : EPUB_PRC_HB_CODE
46.04854|  Priority                 : SRCI_PRIORITY_HIGH
46.04855|------------------------------------------------
46.04855|  Hostboot Build ID: hostboot-28927a7/hbicore.bin
46.04856|================================================
46.10828|System shutting down with error status 0x90000011
46.11163|================================================
46.11164|Error reported by pnor (0x0600) PLID 0x90000011
46.11493|  Secure Boot: Failed to securely load or unload signed boot firmware.
46.11494|  ModuleId   0xd6 MOD_PNORRP_LOADUNLOADSECURESECTION
46.11495|  ReasonCode 0x0604 RC_EXTERNAL_ERROR
46.11496|  UserData1  returncode from msg_sendrecv() or msg->data[1] : 0xfffffffffffffff3
46.11497|  UserData2  SPNOR message type [LOAD | UNLOAD] : 0x0000000200000006
46.11497|------------------------------------------------
46.11498|  Callout type             : Procedure Callout
46.11499|  Procedure                : EPUB_PRC_HB_CODE
46.11499|  Priority                 : SRCI_PRIORITY_HIGH
46.11500|------------------------------------------------
46.11500|  host_load_payload
46.11501|------------------------------------------------
46.11501|  Hostboot Build ID: hostboot-28927a7/hbicore.bin
46.11502|================================================

Which is what we expect. If I replace PAYLOAD with skiboot.lid.xz.stb, the system boots fine. Does op-test do anything special?

ghost commented 6 years ago

Were you building with secureboot enabled in the hostboot config? I've seen the above error before on Witherspoon (which has secureboot enabled in hostboot), but Boston currently does not, and I think that may be where the difference is?

IlyaSmirnov91 commented 6 years ago

I was trying to recreate on Witherspoon originally. I switched to Boston and saw the same error as you posted. Below are some findings.

IlyaSmirnov91 commented 6 years ago

--The PAYLOAD image in this case (skiboot.lid.xz) doesn't have any kind of secure header since even the fake header didn't get added to it during op-build. --The code path with CONFIG_SECUREBOOT unset (which is the case in p9dsu) never checks the header in istep 20.1 (where we load the PAYLOAD). --Also in 20.1 we check the HEADER_MAGIC (the magic number at the beginning of the PAYLOAD header, right after the secure header), and in this scenario we can't find it because the place we look for it is 0x1000 deep into the image (the code still pretends that the secure header, of size 0x1000, is still there) --vaddr of the section is set to start 0x1000 into the image (basically in the middle of the image). See the "image" below... (Normally)

|   Previous section    |
-------------------------
|   Header of PAYLOAD   |
------------------------- <- Normally vaddr points here (0x1000 offset from the start of the header)
|xFD '7' 'z' 'X' 'Z' x00| <- XZ magic number
|.......................|
....

(What happens in this case)

|   Previous section    |
-------------------------
|xFD '7' 'z' 'X' 'Z' x00|
|.......................|
....
|.......................| <- In this case vaddr points here (0x1000 offset from the XZ Magic number)

Since the vaddr is wrong, it is no surprise that something bad happens a bit later in IPL.

I'm digging for an acceptable way to check the headers even in non-secure mode.

ghost commented 6 years ago

In skiboot, to work around all of this fun (which is lovely and non trivial), we do something like this:

bool stb_is_container(const void *buf, size_t size)
{
        ROM_container_raw *c;

        c = (ROM_container_raw*) buf;
        if (!buf || size < SECURE_BOOT_HEADERS_SIZE)
                return false;
        if (be32_to_cpu(c->magic_number) != ROM_MAGIC_NUMBER )
                return false;
        return true;
}

......
        part_signed = stb_is_container(bufp, SECURE_BOOT_HEADERS_SIZE);

        prlog(PR_DEBUG, "FLASH: %s partition %s signed\n", name,
              part_signed ? "is" : "isn't");
        /*
         * part_start/size are raw pointers into the partition.
         *  ie. they will account for ECC if included.
         */

        if (part_signed) {
                bufp += SECURE_BOOT_HEADERS_SIZE;
                bufsz -= SECURE_BOOT_HEADERS_SIZE;
                content_size = stb_sw_payload_size(buf, SECURE_BOOT_HEADERS_SIZE);
                *len = content_size + SECURE_BOOT_HEADERS_SIZE;

                if (content_size > bufsz) {
                        prerror("FLASH: content size > buffer size\n");
                        rc = OPAL_PARAMETER;
                        goto out_free_ffs;
                }

                ffs_part_start += SECURE_BOOT_HEADERS_SIZE;
                if (ecc)
                        ffs_part_start += ecc_size(SECURE_BOOT_HEADERS_SIZE);

                rc = blocklevel_read(flash->bl, ffs_part_start, bufp,
                                          content_size);
       ............
     }  else /* stb_signed */ {
                /*
                 * Back to the old way of doing things, no STB header.
                 */
                 ..............
      }
         * Verify and measure the retrieved PNOR partition as part of the
         * secure boot and trusted boot requirements
         */
        secureboot_verify(id, buf, *len);
        trustedboot_measure(id, buf, *len);

So, we basically probe for the header using the magic number for the secure boot header, and adjust things accordingly (we also then use the payload size in the header to work out how much data to read so we don't read the whole partition). Otherwise we fall back to reading things the old way (which involves a bunch of stuff to work out what size we need to read off flash, seeing if we can save time by not reading the whole partition)

The enforcement of secure boot happens in the final step, where secureboot_verify knows if we're booting in secure mode or not and makes the decision there.

IlyaSmirnov91 commented 6 years ago

After some discussion, we agreed on a fix that will check the partition header even in non-secure boot if the partition is marked with sha512 flag in the xml layout. This will allow to catch the partitions without the fake header and will explicitly fail IPL at the time of checking as opposed to in some weird spot later on. The error thrown now has an RC indicating that the header check has failed. The changes are in review. Note that unfortunately you will still have to have the piece of code above to check the presence of the headers on your side, since they are not forced on all partitions and are driven by the XML flag :(

ghost commented 6 years ago

IlyaSmirnov91 notifications@github.com writes:

After some discussion, we agreed on a fix that will check the partition header even in non-secure boot if the partition is marked with sha512 flag in the xml layout. This will allow to catch the partitions without the fake header and will explicitly fail IPL at the time of checking as opposed to in some weird spot later on. The error thrown now has an RC indicating that the header check has failed. The changes are in review. Note that unfortunately you will still have to have the piece of code above to check the presence of the headers on your side, since they are not forced on all partitions and are driven by the XML flag :(

For skiboot, we don't look at the XML flag, we do the verify based on policy in code for the type of resource we're loading, the secure boot mode we're in and the presence of the header or not.

-- Stewart Smith OPAL Architect, IBM.

IlyaSmirnov91 commented 6 years ago

Changes were merged to hb master today.

IlyaSmirnov91 commented 6 years ago

Reopening. The issue still persists on Bostons (also causes bootloader failures).