open-power / petitboot

GNU General Public License v2.0
213 stars 56 forks source link

PetitBoot is requesting for System Boot Options 4 times each for parameter 0x05(Boot parameters) & 0x61(Host network Settings) #35

Open tomjoseph83 opened 6 years ago

tomjoseph83 commented 6 years ago

On OpenBMC, in a good case, i have observed that Petitboot request for the System Boot Options 4 times both for parameter 0x05 and 0x61 What is the rationale behind that, is there an inefficiency?

IPMI GET_SYS_BOOT_OPTIONS IPMI Response: 0x000000: 00 01 05 80 0c 00 00 00 ........
IPMI Incoming: Seq 0x10, NetFn 0x00, CMD: 0x09 0x000000: 61 00 00 a..
IPMI GET_SYS_BOOT_OPTIONS getHostNetworkData failed for get_sys_boot_options. ERROR:[0xFF] handling NetFn:[0x0], Cmd:[0x9] IPMI Response: 0x000000: ff 01 61 00 00 00 00 00 00 00 00 00 00 00 00 00 ..a............. 0x000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000030: 00 00 00 ...
IPMI Incoming: Seq 0x11, NetFn 0x00, CMD: 0x09 0x000000: 05 00 00 ...
IPMI GET_SYS_BOOT_OPTIONS IPMI Response: 0x000000: 00 01 05 80 0c 00 00 00 ........
IPMI Incoming: Seq 0x12, NetFn 0x00, CMD: 0x09 0x000000: 61 00 00 a..
IPMI GET_SYS_BOOT_OPTIONS getHostNetworkData failed for get_sys_boot_options. ERROR:[0xFF] handling NetFn:[0x0], Cmd:[0x9] IPMI Response: 0x000000: ff 01 61 00 00 00 00 00 00 00 00 00 00 00 00 00 ..a............. 0x000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000030: 00 00 00 ...
IPMI Incoming: Seq 0x13, NetFn 0x00, CMD: 0x09 0x000000: 05 00 00 ...
IPMI GET_SYS_BOOT_OPTIONS IPMI Response: 0x000000: 00 01 05 80 0c 00 00 00 ........
IPMI Incoming: Seq 0x14, NetFn 0x00, CMD: 0x09 0x000000: 61 00 00 a..
IPMI GET_SYS_BOOT_OPTIONS getHostNetworkData failed for get_sys_boot_options. ERROR:[0xFF] handling NetFn:[0x0], Cmd:[0x9] IPMI Response: 0x000000: ff 01 61 00 00 00 00 00 00 00 00 00 00 00 00 00 ..a............. 0x000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000030: 00 00 00 ...
IPMI Incoming: Seq 0x15, NetFn 0x00, CMD: 0x09 0x000000: 05 00 00 ...
IPMI GET_SYS_BOOT_OPTIONS IPMI Response: 0x000000: 00 01 05 80 0c 00 00 00 ........
IPMI Incoming: Seq 0x16, NetFn 0x00, CMD: 0x09 0x000000: 61 00 00 a..
IPMI GET_SYS_BOOT_OPTIONS getHostNetworkData failed for get_sys_boot_options. ERROR:[0xFF] handling NetFn:[0x0], Cmd:[0x9] IPMI Response: 0x000000: ff 01 61 00 00 00 00 00 00 00 00 00 00 00 00 00 ..a............. 0x000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0x000030: 00 00 00

sammj commented 6 years ago

I added some debug to see what Petitboot is doing, and it is as we expect:

/ # grep ipmi_send /var/log/petitboot/pb-discover.log
ipmi_send: netfn 0x 0 cmd 0x 9
ipmi_send: netfn 0x 0 cmd 0x 9
ipmi_send: netfn 0x c cmd 0x 2
ipmi_send: netfn 0x 6 cmd 0x 1
ipmi_send: netfn 0x3a cmd 0x1a

No extra duplicated requests (the second is the network override).

The Skiboot log looks a bit noisier:

[   25.079888614,6] BT: seq 0x0c netfn 0x06 cmd 0x08: Message sent to host
[   25.092227050,6] BT: seq 0x0c netfn 0x06 cmd 0x08: IPMI MSG done
[   25.225933892,6] BT: seq 0x0d netfn 0x0a cmd 0x48: Message sent to host
[   25.243814487,6] BT: seq 0x0d netfn 0x0a cmd 0x48: IPMI MSG done
[   26.094794894,6] BT: seq 0x0e netfn 0x0a cmd 0x48: Message sent to host
[   26.318813377,6] BT: seq 0x0e netfn 0x0a cmd 0x48: IPMI MSG done
[   26.386205018,6] BT: seq 0x0f netfn 0x00 cmd 0x09: Message sent to host
[   26.423255693,6] BT: seq 0x0f netfn 0x00 cmd 0x09: IPMI MSG done
[   26.423290626,6] BT: seq 0x10 netfn 0x00 cmd 0x09: Message sent to host
[   27.119185361,6] BT: seq 0x10 netfn 0x00 cmd 0x09: IPMI MSG done
[   27.119218011,6] BT: seq 0x11 netfn 0x00 cmd 0x09: Message sent to host
[   32.169165302,6] BT: seq 0x11 netfn 0x00 cmd 0x09: IPMI MSG done
[   32.169168218,7] IPMI: Got error response 0xce
[   32.169198938,6] BT: seq 0x12 netfn 0x00 cmd 0x09: Message sent to host
[   37.248537566,6] BT: seq 0x12 netfn 0x00 cmd 0x09: IPMI MSG done
[   37.248540517,7] IPMI: Got error response 0xce
[   37.248573631,6] BT: seq 0x13 netfn 0x00 cmd 0x09: Message sent to host
[   42.259876712,6] BT: seq 0x13 netfn 0x00 cmd 0x09: IPMI MSG done
[   42.259880049,7] IPMI: Got error response 0xce
[   42.259907622,6] BT: seq 0x14 netfn 0x00 cmd 0x09: Message sent to host
[   47.305425268,6] BT: seq 0x14 netfn 0x00 cmd 0x09: IPMI MSG done
[   47.305428144,7] IPMI: Got error response 0xce
[   47.305464114,6] BT: seq 0x15 netfn 0x00 cmd 0x09: Message sent to host
[   52.353516578,6] BT: seq 0x15 netfn 0x00 cmd 0x09: IPMI MSG done
[   52.353519417,7] IPMI: Got error response 0xce
[   52.353558898,6] BT: seq 0x16 netfn 0x00 cmd 0x09: Message sent to host
[   57.446713814,6] BT: seq 0x16 netfn 0x00 cmd 0x09: IPMI MSG done
[   57.446716539,7] IPMI: Got error response 0xce
[   57.446814677,6] BT: seq 0x17 netfn 0x0c cmd 0x02: Message sent to host
[   62.503806124,6] BT: seq 0x17 netfn 0x0c cmd 0x02: IPMI MSG done
[   62.503808881,7] IPMI: Got error response 0xce
[   62.503840383,6] BT: seq 0x18 netfn 0x06 cmd 0x01: Message sent to host
[   68.034229385,6] BT: seq 0x18 netfn 0x06 cmd 0x01: IPMI MSG done
[   68.034232142,7] IPMI: Got error response 0xce
[   68.034265546,6] BT: seq 0x19 netfn 0x3a cmd 0x1a: Message sent to host
[   68.148828409,6] BT: seq 0x19 netfn 0x3a cmd 0x1a: IPMI MSG done
[   68.148830918,7] IPMI: Got error response 0xc1

I'll have to see exactly how that lines up, but we can at least tell that Petitboot itself is not the source of the repeated requests.

sammj commented 6 years ago

I've just noticed that the error we're getting back may be the wrong one - as far as I know for the host network override request OpenBMC claims it is meant to return an "unspecified error" which is 0xff, but what we're seeing here is 0xce - "response error". This may have an effect on what the linux kernel is doing to handle it, @tomjoseph83 do you know why we're getting 0xce instead of 0xff?

tomjoseph83 commented 6 years ago

0xce is responded by btbridge, since the Get system boot options command has crossed the threshold of timeout of 5 seconds. The ipmid application did not complete within the 5 seconds.

We can see here that 5 seconds has expired

[   27.119218011,6] BT: seq 0x11 netfn 0x00 cmd 0x09: Message sent to host
[   32.169165302,6] BT: seq 0x11 netfn 0x00 cmd 0x09: IPMI MSG done
[   32.169168218,7] IPMI: Got error response 0xce

This patch (https://gerrit.openbmc-project.xyz/#/c/8050/ ) has fixed the performance lag for the 0x61 parameter. It is available with openbmc version BUILD_ID="ibm-v2.0-0-r13" and is still not merged into the master.

But I think the Petitboot behaviour of requesting multiple times is not related to 0xce IPMIreturn code In the log snippet pasted above, Get System Boot options had succeeded couple of times , before hitting 0xCE.

[   26.386205018,6] BT: seq 0x0f netfn 0x00 cmd 0x09: Message sent to host
[   26.423255693,6] BT: seq 0x0f netfn 0x00 cmd 0x09: IPMI MSG done
[   26.423290626,6] BT: seq 0x10 netfn 0x00 cmd 0x09: Message sent to host
[   27.119185361,6] BT: seq 0x10 netfn 0x00 cmd 0x09: IPMI MSG done

We did some benchmarking with the openbmc build (BUILD_ID="ibm-v2.0-0-r13") and observed that even though we have responded to the get system boot options command within a sec in all cases, still Petitboot is requesting multiple times.

IPMI GET_SYS_BOOT_OPTIONS - Boot Parameters
IPMI Response: 404ms
IPMI GET_SYS_BOOT_OPTIONS - Host Network Settings
IPMI Response: 16ms
IPMI GET_SYS_BOOT_OPTIONS - Boot Parameters
IPMI Response: 230ms
IPMI GET_SYS_BOOT_OPTIONS - Host Network Settings
IPMI Response: 19ms
IPMI GET_SYS_BOOT_OPTIONS - Boot Parameters
IPMI Response: 64ms
IPMI GET_SYS_BOOT_OPTIONS- Host Network Settings
IPMI Response: 25ms 
IPMI GET_SYS_BOOT_OPTIONS- Boot Parameters
IPMI Response: 351ms
IPMI GET_SYS_BOOT_OPTIONS- Host Network Settings
IPMI Response: 16ms      
sammj commented 6 years ago

Ah good point, I tested with ibm-v2.0-0-r13 and we see the expected response:

[   25.454857377,6] BT: seq 0x0e netfn 0x0a cmd 0x48: Message sent to host
[   26.079602940,6] BT: seq 0x0e netfn 0x0a cmd 0x48: IPMI MSG done
[   26.168195593,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.168207010,6] BT: seq 0x0f netfn 0x00 cmd 0x09: Message sent to host
[   26.213090216,6] BT: seq 0x0f netfn 0x00 cmd 0x09: IPMI MSG done
[   26.213099634,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x07)
[   26.213127039,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.213134192,6] BT: seq 0x10 netfn 0x00 cmd 0x09: Message sent to host
[   26.229080487,6] BT: seq 0x10 netfn 0x00 cmd 0x09: IPMI MSG done
[   26.229083225,7] IPMI: Got error response 0xff
[   26.229087644,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x32)
[   26.229113816,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.229120023,6] BT: seq 0x11 netfn 0x00 cmd 0x09: Message sent to host
[   26.264880532,6] BT: seq 0x11 netfn 0x00 cmd 0x09: IPMI MSG done
[   26.264885463,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x07)
[   26.264906449,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.264912452,6] BT: seq 0x12 netfn 0x00 cmd 0x09: Message sent to host
[   26.280470362,6] BT: seq 0x12 netfn 0x00 cmd 0x09: IPMI MSG done
[   26.280473269,7] IPMI: Got error response 0xff
[   26.280477726,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x32)
[   26.280509268,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.280520001,6] BT: seq 0x13 netfn 0x00 cmd 0x09: Message sent to host
[   26.314921079,6] BT: seq 0x13 netfn 0x00 cmd 0x09: IPMI MSG done
[   26.314927778,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x07)
[   26.314947663,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.314954317,6] BT: seq 0x14 netfn 0x00 cmd 0x09: Message sent to host
[   26.365830714,6] BT: seq 0x14 netfn 0x00 cmd 0x09: IPMI MSG done
[   26.365833472,7] IPMI: Got error response 0xff
[   26.365837680,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x32)
[   26.365868675,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.365875445,6] BT: seq 0x15 netfn 0x00 cmd 0x09: Message sent to host
[   26.412988149,6] BT: seq 0x15 netfn 0x00 cmd 0x09: IPMI MSG done
[   26.412994285,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x07)
[   26.413014097,8] opal_ipmi_send(cmd: 0x09 netfn: 0x00 len: 0x03)
[   26.413020677,6] BT: seq 0x16 netfn 0x00 cmd 0x09: Message sent to host
[   26.428306489,6] BT: seq 0x16 netfn 0x00 cmd 0x09: IPMI MSG done
[   26.428309130,7] IPMI: Got error response 0xff
[   26.428314694,8] opal_ipmi_recv(cmd: 0x09 netfn: 0x01 resp_size: 0x32)
[   26.428410195,8] opal_ipmi_send(cmd: 0x02 netfn: 0x0c len: 0x04)
[   26.428416569,6] BT: seq 0x17 netfn 0x0c cmd 0x02: Message sent to host
[   26.452242357,6] BT: seq 0x17 netfn 0x0c cmd 0x02: IPMI MSG done
[   26.452246868,8] opal_ipmi_recv(cmd: 0x02 netfn: 0x0d resp_size: 0x07)
[   26.452262654,8] opal_ipmi_send(cmd: 0x01 netfn: 0x06 len: 0x00)
[   26.452267665,6] BT: seq 0x18 netfn 0x06 cmd 0x01: Message sent to host
[   26.464506383,6] BT: seq 0x18 netfn 0x06 cmd 0x01: IPMI MSG done
[   26.464510793,8] opal_ipmi_recv(cmd: 0x01 netfn: 0x07 resp_size: 0x0f)
[   26.464532905,8] opal_ipmi_send(cmd: 0x1a netfn: 0x3a len: 0x00)
[   26.464537754,6] BT: seq 0x19 netfn 0x3a cmd 0x1a: Message sent to host
[   26.471906334,6] BT: seq 0x19 netfn 0x3a cmd 0x1a: IPMI MSG done
[   26.471908905,7] IPMI: Got error response 0xc1
[   26.471912556,8] opal_ipmi_recv(cmd: 0x1a netfn: 0x3b resp_size: 0x00)

From this trace it looks like every second response gets the 0xff error, so it actually looks like both the boot override and network override requests are being repeated. There doens't appear to be a obvious retry path in Skiboot, and Petitboot is definitely not the one repeating this requests (as posted above) - perhaps something in the kernel?

sammj commented 6 years ago

I think we now have this figured out. Petitboot as in pb-discover definitely only makes the call once - however there's a little helper utility named pb-config that collects the running configuration and part of doing so is requesting the boot and network overrides. So we have:

All together that's four different invocations, which lines up with the above trace. This is somewhat "working as intended" but we could look into a way of caching the first result, since it will only change if Petitboot itself clears the overrides.