lnls-dig / openMMC

Open source firmware for MMC controllers
GNU General Public License v3.0
39 stars 32 forks source link

Crash when uploading the firmware in slot 9 #194

Closed gustavosr8 closed 8 months ago

gustavosr8 commented 10 months ago

When the firmware is uploaded to an AFC in slot 9 with ipmitool, this error occurs:


Validating firmware image integrity...OK
Performing preparation stage...
Services may be affected during upgrade. Do you wish to continue? (y/n): y
OK

Performing upgrade stage:

-------------------------------------------------------------------------------
|ID  | Name        |                     Versions                        | %  |
|    |             |      Active     |      Backup     |      File       |    |
|----|-------------|-----------------|-----------------|-----------------|----|
|   1|             |   1.00 00000000 | ---.-- -------- |   0.00 00000000 |  0%|Error uploading firmware block
compcode=0x81: Unknown (0x81)

 Error in Upload FIRMWARE command [rc=-1]

 TotalSent:0x17 
-------------------------------------------------------------------------------
(*) Component requires Payload Cold Reset
Firmware upgrade procedure failed 

In the MMC serial interface, we can see that a HardFault Occurs, and with an register dump we have the following informations:

R0: 0x00000002
R1: 0x2007E5A0
R2: 0x10000100
R3: 0x1000046C
R12: 0x00000019
LR: 0x00006F57
PC: 0x00006F56
PSR: 0x61000000

This indicates that the error is related to FreeRTOS queue module queue.c.

gustavosr8 commented 9 months ago

By catching a backtrace on GDB just when the HardFault occurs, we can see the following execution path:

#0  HardFault_Handler ()
    at /home/ABTLUS/gustavo.reis/repos/openMMC/port/ucontroller/nxp/lpc17xx/faults.c:131 
#1  <signal handler called>
#2  xQueueReceive (xQueue=0x8f59504b, pvBuffer=pvBuffer@entry=0x2007e178 <ucHeap+8568>, 
    xTicksToWait=<optimized out>, xTicksToWait@entry=4294967295)
    at /home/ABTLUS/gustavo.reis/repos/openMMC/FreeRTOS/queue.c:1268
#3  0x00003b82 in IPMITask (pvParameters=<optimized out>)
    at /home/ABTLUS/gustavo.reis/repos/openMMC/modules/ipmi.c:67
#4  0x00007014 in ?? ()
gustavosr8 commented 9 months ago

In DumpFiles.zip you can find an core-dump of the execution context when the HardFault occurs and an .elf file of the program binary.

gustavosr8 commented 9 months ago
Dump of assembler code for function IPMITask:
   0x00003d64 <+0>:     stmdb   sp!, {r4, r5, r6, r7, r8, lr}
   0x00003d68 <+4>:     sub     sp, #88 @ 0x58
   0x00003d6a <+6>:     mov.w   r8, #255        @ 0xff
   0x00003d6e <+10>:    ldr     r4, [pc, #144]  @ (0x3e00 <IPMITask+156>)
   0x00003d70 <+12>:    ldr     r7, [pc, #144]  @ (0x3e04 <IPMITask+160>)
   0x00003d72 <+14>:    ldr     r6, [pc, #148]  @ (0x3e08 <IPMITask+164>)
   0x00003d74 <+16>:    ldr     r5, [pc, #148]  @ (0x3e0c <IPMITask+168>)
   0x00003d76 <+18>:    mov.w   r2, #4294967295 @ 0xffffffff
   0x00003d7a <+22>:    mov     r1, sp
   0x00003d7c <+24>:    ldr     r0, [r4, #0]
   0x00003d7e <+26>:    bl      0x7844 <xQueueReceive>
=> 0x00003d82 <+30>:    cbnz    r0, 0x3d9c <IPMITask+56>
   0x00003d84 <+32>:    movs    r1, #69 @ 0x45
   0x00003d86 <+34>:    mov     r0, r5
   0x00003d88 <+36>:    bl      0x71f0 <vAssertCalled>
   0x00003d8c <+40>:    mov.w   r2, #4294967295 @ 0xffffffff
   0x00003d90 <+44>:    mov     r1, sp
   0x00003d92 <+46>:    ldr     r0, [r4, #0]
   0x00003d94 <+48>:    bl      0x7844 <xQueueReceive>
   0x00003d98 <+52>:    cmp     r0, #0
   0x00003d9a <+54>:    beq.n   0x3d84 <IPMITask+32>
   0x00003d9c <+56>:    ldr     r3, [r7, #0]
   0x00003d9e <+58>:    ldr     r1, [r6, #0]
   0x00003da0 <+60>:    ldrb.w  r0, [sp, #1]
   0x00003da4 <+64>:    cmp     r3, r1
   0x00003da6 <+66>:    ldrb.w  r12, [sp, #7]
   0x00003daa <+70>:    bcc.n   0x3db4 <IPMITask+80>
   0x00003dac <+72>:    b.n     0x3de4 <IPMITask+128>
   0x00003dae <+74>:    adds    r3, #8
   0x00003db0 <+76>:    cmp     r3, r1
   0x00003db2 <+78>:    bcs.n   0x3de4 <IPMITask+128>
   0x00003db4 <+80>:    ldrb    r2, [r3, #0]
   0x00003db6 <+82>:    cmp     r2, r0
   0x00003db8 <+84>:    bne.n   0x3dae <IPMITask+74>
   0x00003dba <+86>:    ldrb    r2, [r3, #1]
   0x00003dbc <+88>:    cmp     r2, r12
   0x00003dbe <+90>:    bne.n   0x3dae <IPMITask+74>
   0x00003dc0 <+92>:    ldr     r3, [r3, #4]
   0x00003dc2 <+94>:    cbz     r3, 0x3de4 <IPMITask+128>
   0x00003dc4 <+96>:    mov     r0, sp
   0x00003dc6 <+98>:    add     r1, sp, #44     @ 0x2c
   0x00003dc8 <+100>:   strh.w  r8, [sp, #52]   @ 0x34
   0x00003dcc <+104>:   blx     r3
   0x00003dce <+106>:   mov     r0, sp
   0x00003dd0 <+108>:   add     r1, sp, #44     @ 0x2c
   0x00003dd2 <+110>:   bl      0x383c <ipmb_send_response>
   0x00003dd6 <+114>:   cmp     r0, #1
   0x00003dd8 <+116>:   beq.n   0x3d76 <IPMITask+18>
   0x00003dda <+118>:   movs    r1, #97 @ 0x61
   0x00003ddc <+120>:   mov     r0, r5
   0x00003dde <+122>:   bl      0x71f0 <vAssertCalled>
   0x00003de2 <+126>:   b.n     0x3d76 <IPMITask+18>
   0x00003de4 <+128>:   movs    r3, #193        @ 0xc1
   0x00003de6 <+130>:   mov     r0, sp
   0x00003de8 <+132>:   add     r1, sp, #44     @ 0x2c
   0x00003dea <+134>:   strh.w  r3, [sp, #52]   @ 0x34
   0x00003dee <+138>:   bl      0x383c <ipmb_send_response>
   0x00003df2 <+142>:   cmp     r0, #1
   0x00003df4 <+144>:   beq.n   0x3d76 <IPMITask+18>
   0x00003df6 <+146>:   movs    r1, #108        @ 0x6c
   0x00003df8 <+148>:   mov     r0, r5
   0x00003dfa <+150>:   bl      0x71f0 <vAssertCalled>
   0x00003dfe <+154>:   b.n     0x3d76 <IPMITask+18>
   0x00003e00 <+156>:   lsls    r4, r4, #11
   0x00003e02 <+158>:   asrs    r0, r0, #32
   0x00003e04 <+160>:   lsls    r4, r4, #2
   0x00003e06 <+162>:   asrs    r0, r0, #32
   0x00003e08 <+164>:   lsls    r0, r4, #2
   0x00003e0a <+166>:   asrs    r0, r0, #32
   0x00003e0c <+168>:   uxtb    r0, r7
   0x00003e0e <+170>:   movs    r0, r0
r0             0x1e764fe           31941886
r1             0x2007e888          537389192
r2             0xffffffff          -1
r3             0xe1dfa4fe          -505436930
r4             0x10002369          268444521
r5             0xb2f8              45816
r6             0x100000a0          268435616
r7             0x100000a4          268435620
r8             0xff                255
r9             0xa5a5a5a5          -1515870811
r10            0xa5a5a5a5          -1515870811
r11            0xa5a5a5a5          -1515870811
r12            0x19                25
sp             0x2007e888          0x2007e888 <ucHeap+10376>
lr             0x3d83              15747
pc             0x3d82              0x3d82 <IPMITask+30>
xpsr           0xa1000000          -1593835520
msp            0x10003fe0          0x10003fe0
psp            0x2007e888          0x2007e888 <ucHeap+10376>
primask        0x0                 0
basepri        0x0                 0
faultmask      0x0                 0
control        0x2                 2
(gdb) x 0x10002369
0x10002369:     0x01e764fe
(gdb) x 0x3e00
0x3e00 <IPMITask+156>:  0x100002e4
(gdb) x 0x100002e4
0x100002e4 <ipmi_rxqueue>:      0x2007e2f8
gustavosr8 commented 8 months ago

So, after some investigations, we found out that the problem was occurring because a pointer was pointing to an invalid memory space in IPMITask. Since this pointer was previously valid, it indicate some kind of memory corruption. In sequence, we use ITM functions to analyze the context and the registers values in multiple points of the IPMITask, what was important to discover that the corruption was happening after the requisition handler call.

https://github.com/lnls-dig/openMMC/blob/3b7ae353018cbc15eb9325241074fbee9fe98cb3/modules/ipmi.c#L87-L90

In this way, the investigation focus change to the handler: ipmi_picmg_upload_firmware_block, in this case (what makes sense, since the problem occurs in the upload firmware block command).

https://github.com/lnls-dig/openMMC/blob/3b7ae353018cbc15eb9325241074fbee9fe98cb3/modules/hpm.c#L307-L345

With some tests, we found out that the fault happens right at the begging of the upload. Since the problem was identified as a memory corruption, we suspect this memcpy call.

https://github.com/lnls-dig/openMMC/blob/3b7ae353018cbc15eb9325241074fbee9fe98cb3/modules/hpm.c#L321-L323

Investigating this deeply, we observe that the variable block_sz (that comes from req->data_len-2) was reaching the value 23 (which is absolutely normal), but block_data had only 20 bytes allocated (HPM_BLOCK_SIZE), so the memcpy call was corrupting the stack. After returning from req_handler, the restored value of R4 got corrupted resulting in an invalid memory access in the xQueueReceive function.

Increasing block_data size fixes the problem.

So, an final fix for this problem could be increase the block_data buffer size, and wrap the memcpy function in an if-else statement, that check if the data could fit in the buffer, and return an error if not possible.