oxidecomputer / humility

Debugger for Hubris
Mozilla Public License 2.0
529 stars 51 forks source link

Decoding of IBC blackbox timestamps continues to be weird #473

Open mkeeter opened 7 months ago

mkeeter commented 7 months ago

Looking at the log from https://github.com/oxidecomputer/mfg-quality/issues/140#issuecomment-2046150254, we're still seeing timestamp weirdness.

Here's a condensed version of that log:

FAULT EVENT
  EVENT_INDEX:        0
  TIMESTAMP           0x000002d9 = 1 min, 12.9 sec
  EVENT_ID            0x0000
  STATUS_WORD         0x0010
    0x0010: Output overcurrent fault
  STATUS_IOUT         0x80
    0x80: Iout Overcurrent Fault
FAULT EVENT
  EVENT_INDEX:        1
  TIMESTAMP           0x000002e3 = 1 min, 13.9 sec
  EVENT_ID            0x0001
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x01
    0x01: INPUT_LOW_EVENT
FAULT EVENT
  EVENT_INDEX:        2
  TIMESTAMP           0x27ffb2e3 = 776 day, 16 hr, 48 min, 9.9 sec
  EVENT_ID            0x0002
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x02
    0x02: CANCEL_EVENT
FAULT EVENT
  EVENT_INDEX:        3
  TIMESTAMP           0x4fff62e3 = 1553 day, 9 hr, 35 min, 5.9 sec
  EVENT_ID            0x0003
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x02
    0x02: CANCEL_EVENT

LIFECYCLE EVENT
  EVENT_INDEX:        24
  TIMESTAMP           0xffa9b177 = 4964 day, 11 hr, 31 min, 49.5 sec
  EVENT_ID            0x0194
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x05
    0x05: ERASE_OVFL_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        25
  TIMESTAMP           0xffa9b177 = 4964 day, 11 hr, 31 min, 49.5 sec
  EVENT_ID            0x0195
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        26
  TIMESTAMP           0xffa9b177 = 4964 day, 11 hr, 31 min, 49.5 sec
  EVENT_ID            0x0196
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x01
    0x01: INPUT_LOW_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        27
  TIMESTAMP           0xffa9b17a = +0.0 sec
  EVENT_ID            0x0197
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x00
    0x00: BOOT_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        28
  TIMESTAMP           0xffa9b17e = +0.4 sec
  EVENT_ID            0x0198
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        29
  TIMESTAMP           0xffa9b17e = +0.4 sec
  EVENT_ID            0x0199
  STATUS_WORD         0x0800
    0x0800: Power-Good signal
  STATUS_MFR          0x01
    0x01: Power is good
LIFECYCLE EVENT
  EVENT_INDEX:        30
  TIMESTAMP           0xffadaffe = +7 hr, 16 min, 16.4 sec
  EVENT_ID            0x019a
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        31
  TIMESTAMP           0xffadb000 = +7 hr, 16 min, 16.6 sec
  EVENT_ID            0x019b
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x04
    0x04: Unit is on due to internal configuration
LIFECYCLE EVENT
  EVENT_INDEX:        32
  TIMESTAMP           0xffe8eea2 = +0.0 sec
  EVENT_ID            0x019c
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x06
    0x06: TIME_ERASE_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        33
  TIMESTAMP           0xffe52922 = 4968 day, 23 hr, 47 min, 15.4 sec
  EVENT_ID            0x019d
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x05
    0x05: ERASE_OVFL_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        34
  TIMESTAMP           0xffe52927 = 4968 day, 23 hr, 47 min, 15.9 sec
  EVENT_ID            0x019e
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        35
  TIMESTAMP           0xffe52927 = 4968 day, 23 hr, 47 min, 15.9 sec
  EVENT_ID            0x019f
  STATUS_WORD         0x0800
    0x0800: Power-Good signal
  STATUS_MFR          0x01
    0x01: Power is good
LIFECYCLE EVENT
  EVENT_INDEX:        36
  TIMESTAMP           0xffe9102b = +0.0 sec
  EVENT_ID            0x01a0
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x00
    0x00: BOOT_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        37
  TIMESTAMP           0xffe9102e = +0.3 sec
  EVENT_ID            0x01a1
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        38
  TIMESTAMP           0xffe9102e = +0.3 sec
  EVENT_ID            0x01a2
  STATUS_WORD         0x0800
    0x0800: Power-Good signal
  STATUS_MFR          0x01
    0x01: Power is good
LIFECYCLE EVENT
  EVENT_INDEX:        39
  TIMESTAMP           0xffe91046 = +2.7 sec
  EVENT_ID            0x01a3
  STATUS_WORD         0x0002
    0x0002: CML fault
  STATUS_CML          0x02
    0x02: Other communication fault
LIFECYCLE EVENT
  EVENT_INDEX:        40
  TIMESTAMP           0x00103ffe = 1 day, 5 hr, 34 min, 55.8 sec
  EVENT_ID            0x01a4
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        41
  TIMESTAMP           0x00103ffe = 1 day, 5 hr, 34 min, 55.8 sec
  EVENT_ID            0x01a5
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x01
    0x01: INPUT_LOW_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        42
  TIMESTAMP           0x00104000 = +0.0 sec
  EVENT_ID            0x01a6
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x00
    0x00: BOOT_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        43
  TIMESTAMP           0x00104003 = +0.3 sec
  EVENT_ID            0x01a7
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x05
    0x05: Unit is on due to primary remote control
LIFECYCLE EVENT
  EVENT_INDEX:        44
  TIMESTAMP           0x001a1b84 = +17 hr, 56 min, 42.0 sec
  EVENT_ID            0x01a8
  STATUS_WORD         0x0001
    0x0001: System event
  STATUS_MFR          0x05
    0x05: ERASE_OVFL_EVENT
LIFECYCLE EVENT
  EVENT_INDEX:        45
  TIMESTAMP           0x001a1b84 = +17 hr, 56 min, 42.0 sec
  EVENT_ID            0x01a9
  STATUS_WORD         0x2000
    0x2000: input voltage, input current, or input power fault or warning
  STATUS_INPUT        0x08
    0x08: Insufficient Vin
LIFECYCLE EVENT
  EVENT_INDEX:        46
  TIMESTAMP           0x001a1b84 = +17 hr, 56 min, 42.0 sec
  EVENT_ID            0x01aa
  STATUS_WORD         0x0040
    0x0040: Unit is not providing power
  STATUS_MFR          0x04
    0x04: Unit is on due to internal configuration

Remember, there are two separate ring buffers for fault (0-23) and life-cycle events (24-48).

There are many different weird things happening here: