grml / grml-hwinfo

4 stars 4 forks source link

grml-hwinfo: Include some more error outputs #12

Open jkirk opened 5 months ago

jkirk commented 5 months ago

Some more error output is missing:

  root@grml ~ # DISPLAY=:0.0 grml-hwinfo
  grml-hwinfo 0.17.1 - collect hardware information
  Output file:      /root/grml-hwinfo-2024-06-08--18-21-24-UTC.tar.bz2

  This might take a few seconds/minutes. Please be patient...
  pcilib: sysfs_read_vpd: read failed: No such device
  Starting sysdump...
    NOTE: if it seems to be hanging at this stage file a bug report with output of:
          lsof -p $(pgrep -f $(which sysdump))
  Execution of sysdump finished.
  Error: /dev/sda: unrecognised disk label
  MODE SENSE(10): Malformed SCSI command

  root@grml ~ # lspci -vvnn > /dev/null
  pcilib: sysfs_read_vpd: read failed: No such device
  root@grml ~ # parted -s /dev/sda print > /dev/null
  Error: /dev/sda: unrecognised disk label
  1 root@grml ~ # sdparm --all --long /dev/sdb > /dev/null
  MODE SENSE(10): Malformed SCSI command
  97 root@grml ~ #   

I was using an older Grml daily. So, the sdparm problem was fixed in #10, but we should include the error outputs of lspci and parted and most probably some other tools.

I also think that putting the error output in a separate file is problematic, as one can not see where the error actually occurs. But on the other hand look at this: The output of pcilib: sysfs_read_vpd: read failed: No such device is "somewhere else":

root@grml ~ # lspci -vvnn 2>&1 | grep -C 10 pcilib 
    Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
    Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Address: 0000000000000000  Data: 0000
    Capabilities: [70] Express (v2) Endpoint, MSI 01
        DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
            ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 10W
        DevCtl: CorrErr+ NonFatalErr+ Fatapcilib: sysfs_read_vpd: read failed: No such device
lErr+ UnsupReq+
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 128 bytes, MaxReadReq 4096 bytes
        DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr+ TransPend-
        LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
            ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 2.5GT/s, Width x1
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

root@grml ~ # lspci -vvnn 
[...]
    Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
        Vector table: BAR=4 offset=00000000
        PBA: BAR=4 offset=00000800
    Capabilities: [d0] Vital Product Data
pcilib: sysfs_read_vpd: read failed: No such device
        Not readable
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
    Capabilities: [140 v1] Virtual Channel
        Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:   ArbSelect=Fixed
        Status: InProgress-
        VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status: NegoPending- InProgress-
    Capabilities: [160 v1] Device Serial Number 62-04-00-00-68-4c-e0-00
    Kernel driver in use: r8169
    Kernel modules: r8169

FTR, pcilib: sysfs_read_vpd: read failed: No such device is not an "actual" error, there is just no VPD EEPROM present: https://bugzilla.kernel.org/show_bug.cgi?id=199467

(But this is a bug and this isn't handled well in pcilib.)

mika commented 3 months ago

So I also stumbled upon the parted issue on my own and took care of this, see commit 82935916cffaba6c9867f5c1e223adb8b3a238e7

The sdparm issue was already taken care of in 5f911361 AKA https://github.com/grml/grml-hwinfo/pull/11

The lspci issue is interesting, though I don't agree with https://bugzilla.kernel.org/show_bug.cgi?id=199467#c6, quoting:

It's neither a bug nor an actual error. The message simply means that the optional VPD EEPROM isn't present. The ticket should be closed.

Either you report it to stderr because it's an error or not? ;)

Instead I fully agree with:

(But this is a bug and this isn't handled well in pcilib.)

So for the time being let's also report lspci's stderr to a separate file, as we tend to do, done in commit bbfd3b1ebb7b8c2f2188a6482f54e1481987969b

But I agree also with @jkirk's:

I also think that putting the error output in a separate file is problematic, as one can not see where the error actually occurs.

Though this needs further redesign of how grml-hwinfo works, maybe let's discuss this before closing this issue?

jkirk commented 3 months ago

Quick idea: What about a third(?) "full" log file for every output where we put 'stdout' and 'stderr' in one file?

mika commented 3 months ago

Quick idea: What about a third(?) "full" log file for every output where we put 'stdout' and 'stderr' in one file?

Sorry, don't understand your idea or how that exactly should look like :thinking:

jkirk commented 1 month ago

I meant something like this:

lspci -vvnn &>./lspci_verbose.full 1>./lspci_verbose 2>./lspci_verbose.error