labgrid-project / labgrid

Embedded systems control library for development, testing and installation
https://labgrid.readthedocs.io/
Other
327 stars 164 forks source link

How to check the components of a lab? #1070

Open sjg20 opened 1 year ago

sjg20 commented 1 year ago

I find that sometimes my lab rots, e.g. a piece has fallen out, or something has failed. I have a 'check' tool which goes through and makes sure that everything is working. It works mostly in parallel so is fairly fast. At the end it tells me what is wrong.

Is that possible in labgrid?

For example, with an SDwire, it switches the mux and sees if the device can be mounted / seen:

    def check(self):
        """Run a check on the SDwire to see that it seems to work OK

        Returns:
            work.CheckResult: Result obtained from the check
        """
        try:
            self.select_dut()
            if self.get_status() != self.DUT:
                self.raise_self('Failed to switch to DUT')
            time.sleep(1)
            self.select_ts()
            if self.get_status() != self.TS:
                self.raise_self('Failed to switch to TS')

            symlink_path = '/dev/%s' % self._symlink
            result = self.lab.run_command('head', '-0', symlink_path)
            if result.return_code:
                self.raise_self("Failed to locate '%s'" % symlink_path)

            if self._block_symlink:
                self.check_for_block_symlink()

            if self._mount_uuid:
                self.check_for_mount()

            msg = 'all OK'
            good = True
        except ValueError as exc:
            msg = str(exc)
            good = False
        return work.CheckResult(self, good, msg)

The output ends up being something like this and I can check each hub for the problem ports:

Good 104, bad 19, not tested 0 
   usbport6: head: cannot open '/dev/ttyusb_port6' for reading: No such file or directory
   usbport3: head: cannot open '/dev/ttyusb_port3' for reading: No such file or directory
   portserver1: Cannot connect: b''
   sunxi-usb4: lab kea: dut pine64: No power control
   usbport17: head: cannot open '/dev/ttyusb_port17' for reading: No such file or directory
   rockchipusb0: head: cannot open '/dev/usbdev-kevin' for reading: No such file or directory
   samsungusb0: head: cannot open '/dev/usbdev-snow' for reading: No such file or directory
   tegra-usb0: head: cannot open '/dev/usbdev-jetson-tk1' for reading: No such file or directory
   imxusb0: head: cannot open '/dev/usbdev-snappermx6' for reading: No such file or directory
   intelusb0: head: cannot open '/dev/usbdev-edison' for reading: No such file or directory

Fix list for hub 'hubc'
    2: usbport6
    6: usbport3

Fix list for hub 'hubf'
    5: sunxi-usb4

Fix list for hub 'hube'
    2: svcoral
    9: tegra-usb1
   14: svbob

Fix list for hub 'huba'
    5: svsamus
    6: svnyan-big
    7: svlink
    8: svjerry
   13: tegra-usb0
   16: usbport17

Fix list for hub 'hubd'
    1: svsnow
    3: imxusb0
    4: samsungusb0
    7: svkevin
    8: rockchipusb0
jluebbe commented 1 year ago

For USB devices, part of this is represented in the the Resources 'avail' flag: It becomes true when the device appears on the bus (signaled via udev events), and becomes false again if it disappears. You can see the current state in the output of labgrid-client -vv resources and the live changes using labgrid-client monitor:

Place <redacted> changed:
  acquired: None -> dude03/jlu
  acquired_resources: [] -> [['rlabB-srv', 'b-3', 'NetworkPowerPort', 'NetworkPowerPort'], ['rlabB-srv', 'b-usb-1-p4', 'NetworkSerialPort', 'USBSerialPort'], ['rlabB-srv', 'b-usb-1-p5', 'NetworkAndroidFastboot', 'AndroidFastboot'], ['rlabB-srv', 'b-usb-1-p5', 'NetworkIMXUSBLoader', 'IMXUSBLoader'], ['rlabB-srv', 'io-5.11-out0', 'NetworkLXAIOBusPIO', 'LXAIOBusPIO']]
  changed: 1673666125.8015566 -> 1674034634.3246694
Resource rlabB-srv/b-usb-1-p5/IMXUSBLoader changed:
  avail: False -> True
  params.busnum: None -> 1
  params.devnum: None -> 49
  params.model_id: None -> 125
  params.path: None -> 1-1.3
  params.vendor_id: None -> 5538
Resource rlabB-srv/b-usb-1-p5/IMXUSBLoader changed:
  avail: True -> False
  params.busnum: 1 -> None
  params.devnum: 49 -> None
  params.model_id: 125 -> None
  params.path: 1-1.3 -> None
  params.vendor_id: 5538 -> None

Here, I've aquired a place, and enabled and then disabled power. The IMXUSBLoader represent's the i.MX6's USB ROM loader protocol.

We don't have a built-in command to check if some resources from a given list are currently not available, but the information to do this is there. A complication is that labgrid currently doesn't know which resources should be always there (i.e. USB serial adapters, USB cameras, ...) and which show up only depending on the DUT's state (android fastboot, Linux USB serial/mass-storage/ethernet gadgets, ...). Furthermore some labs (like ours) are relatively dynamic, we often add devices (like serial adapters) which are matched based on the USB location and then combine them into places as needed. Others prefer to configure the resources relatively static in the exporter.

The natural location to configure this difference is probably in the exporter configuration. You'd mark some resources as 'fixed' or 'always expected' and then you could have labgrid-client report a list of missing resources. This wouldn't work well in our case though, as many resources configured in the exporter are only connected as needed.

For our dynamic case, we'd probably need information on the place level, regarding which of the resources should always be available.

Building on top of this, we could add a "self-check" procedure in the relevant Drivers to see if the device basically works.

Would that fit your idea?

sjg20 commented 5 months ago

Sorry for the long delay...I am hoping to spend this week on labgrid.

Yes I think your idea would work:

One remaining question is how to check non-trivial things, e.g. that an SDwire has a valid SD card in it and it can be accessed. With my homegrown tool, the UUID or block-device symlink are in the yaml, so it can check that these appear and disappear (by mounting/unmounting the UUID, or dd from the symlink).

sjg20 commented 4 months ago

Now that I have a better understanding of things, I believe that labgrid could provide two levels of checks:

  1. Make sure that expected devices are actually present (e.g. a hub is not turned off / dead)
  2. Make sure that devices work (e.g. flip the SDwire mux and see that the media become in/visible to the host as expected)