lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.58k stars 776 forks source link

[opentitanlib] CW310 not detected in rootless podman container #17569

Open andreaskurth opened 1 year ago

andreaskurth commented 1 year ago

When running an OpenTitan container with rootless podman, opentitanlib does not detect a CW310 even if its devices are correctly mapped into the container (with --device=/dev/ttyXXX and --device=/dev/bus/usb/YYY) and permissions inside and outside the container allow read and write access to all devices from inside the container. The error messages I have seen are Command result: Transport does not support the requested operation or Error: Transport does not support Gpio.

A workaround is to create an rcfile for opentitantool (default location: ~/.config/opentitantool/config) and add the following settings to it:

--cw310-uarts=/dev/ttyACM2,/dev/ttyACM1
--interface=cw310

The TTY filenames depend on the setup of host and container. In my experience, CW310 comes with two TTYs and those have to be listed in descending numeric order. (If a Husky is additionally attached, its TTY has a number just above those of the CW310.) This may depend on your udev mechanisms/rules, though, so some experimentation may be required to get it right.

With this, you should now be able to program the FPGA and load and execute a binary. For example,

bazel build //sw/device/examples/hello_world:hello_world_fpga_cw310_bin
bazel run //sw/host/opentitantool -- bootstrap $(ci/scripts/target-location.sh //sw/device/examples/hello_world:hello_world_fpga_cw310_bin)

should not present any errors.

bazel test execution on the FPGA won't work out of the box, though; see https://github.com/lowRISC/opentitan/issues/17567 for further information and a workaround.

I haven't debugged this further and don't expect that someone else does so at this point; I just wanted to share this for visibility and a possible workaround.

a-will commented 1 year ago

Auto-detection is a challenging problem in root-less containers because udev lookup is typically broken. Without it, we can't associate the kernel serial devices with the USB device (e.g. to know which serial devices belong to the NewAE USB device we identified for control).

In CI, we currently work around the container issues by expecting very particular names for the UARTs:

https://github.com/lowRISC/opentitan/blob/8742883828c7947bc245f417bd3b11c474b442df/rules/opentitan_test.bzl#L552

This is activated with the following argument to the invocation (which could go in a .bazelrc):

https://github.com/lowRISC/opentitan/blob/8742883828c7947bc245f417bd3b11c474b442df/ci/scripts/run-fpga-cw310-tests.sh#L58

Just an FYI here. There likely are ways we could improve how this works, but currently, matching the expected device names and selecting the bazel config is how we do it.

jwnrt commented 1 year ago

Auto-detection is a challenging problem in root-less containers because udev lookup is typically broken. Without it, we can't associate the kernel serial devices with the USB device (e.g. to know which serial devices belong to the NewAE USB device we identified for control).

For a recent project I was using the linux sysfs for this, i.e. searching /sys/usb/devices/x-y/{idVendor,idProduct} for the correct USB device, then searching its interface directories for tty/ directories which contain the name of the ttyXXX.

Would this work in a rootless container? Do we need to do this for more than just TTYs?

pamaury commented 1 year ago

@andreaskurth can you try with #19695 ?

a-will commented 1 year ago

Auto-detection is a challenging problem in root-less containers because udev lookup is typically broken. Without it, we can't associate the kernel serial devices with the USB device (e.g. to know which serial devices belong to the NewAE USB device we identified for control).

For a recent project I was using the linux sysfs for this, i.e. searching /sys/usb/devices/x-y/{idVendor,idProduct} for the correct USB device, then searching its interface directories for tty/ directories which contain the name of the ttyXXX.

Would this work in a rootless container? Do we need to do this for more than just TTYs?

I do not recommend creating yet another limited library that manually walks through sysfs. libudev / sd_device already handle that very well. I'm forgetting all the details right now, but I think there were conflicts between the container setup for CI and the realities of device node creation.

Probably a container with /dev and sysfs mounted directly would be able to find the devices it needed--The host's handling of netlink events would create devices, and the container would just use whatever it was provided by the host. In such a case, we couldn't rely on special symlinks and hidden devfs entries for isolation: The host would need to provide a container with its specific USB devices to target, and udev rules would be needed to maintain permissions on device nodes so different containers could only access their own devices.

But that's not the current setup, if I recall correctly. Instead, each container has some special /dev that doesn't match the host's, and so the container is cut off from the work "udev" does. I'm not sure this is going to work for OT's USB device, since it will disappear when it disconnects (FPGA reprogramming, for example), then come back with different IDs.

pamaury commented 1 year ago

Independently of whether udev works or not in the CI, something will have to be changed in opentitanlib to support for the CW340 in all possible configurations. When using the FTDI chip for UARTs on the CW340, it is not possible to rely on the current "trick" of using the serial number to match the USB device and the TTYs. This can be fixed with udev (or sysfs).

Presently, the CI enforces permissions to devices by using cgroups. I am no docker expert but I believe that with the current setup and by mounting /dev and sysf it should work and still enforce permissions. The current code handles dynamic disconnection and reconnection to the container just fine and relies on the USB tree hierarchy and not just IDs, so I think this can be tweaked to handle OT USB when we need it.

a-will commented 1 year ago

Independently of whether udev works or not in the CI, something will have to be changed in opentitanlib to support for the CW340 in all possible configurations. When using the FTDI chip for UARTs on the CW340, it is not possible to rely on the current "trick" of using the serial number to match the USB device and the TTYs. This can be fixed with udev (or sysfs).

Presently, the CI enforces permissions to devices by using cgroups. I am no docker expert but I believe that with the current setup and by mounting /dev and sysf it should work and still enforce permissions. The current code handles dynamic disconnection and reconnection to the container just fine and relies on the USB tree hierarchy and not just IDs, so I think this can be tweaked to handle OT USB when we need it.

Personally, I think we should not attempt to support "all possible" configurations. The only one we should pursue is hyperdebug. Adding more is superfluous and wastes boards (and adds unnecessary maintenance burdens), as you can't change the config with automation without a robotic arm ;)

Why hyperdebug over the others? It's the only host interface that supports QSPI.

jwnrt commented 1 year ago

Personally, I think we should not attempt to support "all possible" configurations. The only one we should pursue is hyperdebug.

Maybe I'm misunderstanding, but I think the plan is to only support the hyperdebug configuration in CI.

Shouldn't we still support the other configurations for local development though (i.e. with this dev container)? The CW340 is going to be the primary officially supported platform, and I think requiring hyperdebug on top of that could be confusing

a-will commented 1 year ago

Personally, I think we should not attempt to support "all possible" configurations. The only one we should pursue is hyperdebug.

Maybe I'm misunderstanding, but I think the plan is to only support the hyperdebug configuration in CI.

Shouldn't we still support the other configurations for local development though (i.e. with this dev container)? The CW340 is going to be the primary officially supported platform, and I think requiring hyperdebug on top of that could be confusing

From my POV, no. And here's why:

msfschaffner commented 1 year ago

Adding / having to maintain different variants of boards and debug interfaces comes at a cost (both CI resources and eng time) and has a considerable impact on ongoing work streams. Hence, I think this merits a broader discussion with all stakeholders to make sure we're aligned and prioritizing our efforts correctly.

@moidx @jonmichelson @johngt @mundaym @GregAC This is probably worth discussing in our upcoming planning sessions.

nasahlpa commented 1 year ago

I am not sure whether this issue could be related to a similar problem we had in ot-sca. When having attached multiple CW devices, we only could use the container when using the --privileged mode.

The issue was the the ChipWhisperer USB stack first scans all USB devices and should automatically filter devices with wrong permissions. However, this mechanism was buggy. This PR fixed the issue. Hence, updating CW in our container fixed the problem and we can now use it without the --privileged mode.

pamaury commented 1 year ago

The problem here is different, it relates to the fact that the code in opentitantool tries to list all serial devices using udev, but udev does not work completely in a container with --privileged. But obviously we don't want to have our CI containers use this option. Our CI already using some hack to make udev "kind of work" but for example it is not able to handle device that dynamically (dis)appear and there are issues with symlinks. Last time I checked, it looks like it's very difficult to make udev work in a container, but there seems to be no good reason for that except for the conflict between docker developers and systemd developers.

a-will commented 1 year ago

You might consider it to be that the serialport crate is choosing less reliable properties to check and pass back. Inside the container, udev lookups still work for me, but only the attributes are available (not derived properties / environment variables created from udev rules). If the serialport crate properly walked the device tree with libudev instead, serialport::available_ports() could still return useful info.

The crate merely needed to check the parent device of the "tty" node to find the type (with another jump from "cdc_acm" or "usb-serial" parents to get to the "usb" parent device). But it doesn't do that; instead, it relies on the ID_BUS environment variable...

I'm not sure how to get access to udev's special database from the container.

pamaury commented 1 year ago

I admit that I don't know udev really well but if it indeed relies on less reliable attributes then we should just patch serialport (and ideally upstream the changes).