zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.99k stars 6.69k forks source link

Thread Border Router with NRC/RCP sample and nrf52840dk not starting #30429

Closed ADEscobar closed 3 years ago

ADEscobar commented 3 years ago

I am trying to build and run a complete Thread Border Router using the OpenThread Co-Processor Networking Sample for the nrf52840dk_nrf52840 in the latest master branch.

The sample documentation refers to the official OpenThread docu to set-up and build the border router. As far as I know, while there is a Zephyr Project repository for the OpenThread module itself, there is no Zephyr Project mirror (or submodule) for the OpenThread Border Router repo.

I have built the sample for nrf52840dk_nrf52840 with both co-processor alternatives (NCP and RCP) and both interfaces (UART and USB), using the different overlays and tried all of the possibilities together with the OTBR built from the latest openthread/ot-br-posix master and run it in a Raspberry Pi 3B, but I get all sort of errors that prevent the otbr-agent from being properly started (like "RCP is missing required capabilities: CSMA-backoff" or Spinel version errors).

Am I doing something wrong? Is there a plan to fork the ot-br-posix repository in order to freeze the required status and related dependencies so that we can easily build a working complete border router together with the Zephyr Co-Processor Networking Sample? Which is the current workaround to make it work?

rlubos commented 3 years ago

The 802154 radio driver for nRF was recently updated, and unfortunately, its open-source variant was trimmed of the CSMA-CA capability (that's the reason you see the warnings). What's even worse, the RCP support was merged after the radio driver was updated, so we don't have a single commit on master where we have both RCP support and the "old" radio driver...

A fully-featured radio driver is available in nRF Connect SDK (https://github.com/nrfconnect/sdk-nrf/), but we don't have the commit adding RCP in there yet. It should be available in a few weeks though.

If you're comfortable with git, as a workaround you could use a Zephyr commit dated before 35ec164e364f639b81e8accd79d0caba6b624f73 and cherry-pick the commit adding RCP support manually (638b5f389fe8cfe0967a075dcfee43c0a38a4eb2). I've succesfully run the RCP before the driver update several times.

FYI @hubertmis @MarekPorwisz

Is there a plan to fork the ot-br-posix repository

There are no such plans, at least not from our (Nordic) side.

ADEscobar commented 3 years ago

That means the official Zephyr repo (zephyrproject-rtos/zephyr) will no longer work as RCP with the nrf52840, since the CSMA-CA capability is mandatory for that, right? I think that should be documented then.

What about NCP? This should still work, but I think it is being slowly left behind, as RCP is normally the preferred mode by the OpenThread devs for the future.

There are no such plans, at least not from our (Nordic) side.

What about the Zephyr Project side? If we want to have a working OpenThread Border Router I think forking the ot-br-posix would be the safest way, since redirecting to the official OpenThread documentation and an external repository (using a particular version of OT as submodule, potentially much different from the Zephyr Project OT mirror) for such a key feature means that it will be very often broken.

rlubos commented 3 years ago

That means the official Zephyr repo (zephyrproject-rtos/zephyr) will no longer work as RCP with the nrf52840, since the CSMA-CA capability is mandatory for that, right? I think that should be documented then.

That's unfortunate but yes, you're correct. The CSMA feature is now distributed as a closed part of the radio driver and cannot be a part of the Zephyr project. I agree about the doc, I need to update it (I wasn't even aware CSMA-CA capability is critical for RCP operation until this issue was reported).

What about NCP? This should still work, but I think it is being slowly left behind, as RCP is normally the preferred mode by the OpenThread devs for the future.

The NCP still works (since it can use CSMA-CA implemented in the OT MAC layer.

If we want to have a working OpenThread Border Router I think forking the ot-br-posix would be the safest way, since redirecting to the official OpenThread documentation and an external repository (using a particular version of OT as submodule, potentially much different from the Zephyr Project OT mirror) for such a key feature means that it will be very often broken.

To be honest, personally, I'd rather avoid that. OTBR is a separate project, not related to Zephyr in any way. I don't see a reason for Zephyr to maintain a fork of yet another repository. It should be easier to update the OT version supported in Zephyr if any incompatibility shows up rather than maintain another repo fork.

ADEscobar commented 3 years ago

To be honest, personally, I'd rather avoid that. OTBR is a separate project, not related to Zephyr in any way. I don't see a reason for Zephyr to maintain a fork of yet another repository. It should be easier to update the OT version supported in Zephyr if any incompatibility shows up rather than maintain another repo fork.

Maybe it can at least be documented with which commit or version of OTBR the Co-Processor Networking Samples of Zephyr are known to be working, otherwise they are of little use.

rlubos commented 3 years ago

@ADEscobar I did some more investigation and realized that it's possible to enable CSMA backoff at software level in OpenThreads MAC, so it's possible to run RCP even w/o hardware support for this feature.

I've posted a PR which enables it (https://github.com/zephyrproject-rtos/zephyr/pull/30556) and tested that RCP built for nrf52840dk_nrf52840 works correctly with the docker image from https://openthread.io/guides/border-router/docker?hl=en#pull_the_image_from_docker_hub

JavierGarJim commented 3 years ago

@rlubos thanks for your support with this issue. I just tested your branch, building the RCP for the nrf52840dk_nrf52840 board and with the usb overlay, but it did not work together with the border router, either as docker image or native installed in the rpi3. How did you test it?

rlubos commented 3 years ago

I've tested with official docker image (https://openthread.io/guides/border-router/docker?hl=en) and UART transport.

I think that the issue with USB might be related to the reset that is triggered on the RCP when BR starts, and the fact that the USB device re-enumerates then. I'm observing this on my side. Upstream NCP sample implements something called "pseudo reset" to prevent disconnect of the USB device: https://github.com/openthread/openthread/blob/master/examples/apps/ncp/main.c#L89

We don't have such a mechanism in Zephyr.

github-actions[bot] commented 3 years ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

markus-becker-tridonic-com commented 3 years ago

Indeed, the way we are using RCP is via UART which does not enumerate. We've had the same problems with USB.

@rlubos would it make sense to mimic the pseudo reset in Zephyr as well?

rlubos commented 3 years ago

@markus-becker-tridonic-com I think it could make sense, I'm not sure though how difficult would that be in Zephyr nor have it on my roadmap right now. Contributions are welcome of course though.

hubertmis commented 3 years ago

Pseudo reset is a risky functionality. The host expects the co-processor to reset itself, but co-processor actually does not reset itself, but resets selected modules (Which? Why only those? Is module reset enough to claim it's in initial state?).

I would like to propose an alternative solution. Perhaps it would be better to make host resilient to co-processor re-enumeration after it requests co-processor to reset. Would it be possible?

rlubos commented 3 years ago

@markus-becker-tridonic-com Actually, after talking with Nordic's Thread team I learned that one of the engineers is investigating the possibility to add OT "pseudo reset" to Zephyr.

@hubertmis It's hard to disagree that this "pseudo reset" is more of a workarund, it would be better to address it directly in the OTBR implementation. But the question of whether it is possible should rather be asked on a different forum, i. e. the OTBR community, all in all, this solution was proposed by one of the core OT developers at that time, I don't remember though what was the reason to implement it that way instead of fixing it on the BR side.

hubertmis commented 3 years ago

Thanks, I'll raise the concern in OT. I'll post my finding here when I know more

markus-becker-tridonic-com commented 3 years ago

@hubertmis that sounds even better. looking forward to your report.

lmaciejonczyk commented 3 years ago

I've added option to properly handle the USB connection after resetting RCP/NCP device for OpenThread and wpantund. More about this functionality here: https://github.com/zephyrproject-rtos/zephyr/commit/4500862af13f4340a08c00e5cb078be2d5c6f39d and here: https://github.com/nrfconnect/sdk-nrf/pull/4121

nageshshamnur commented 3 years ago

I've added option to properly handle the USB connection after resetting RCP/NCP device for OpenThread and wpantund. More about this functionality here: 4500862 and here: nrfconnect/sdk-nrf#4121

Hi @lmaciejonczyk : I am working on zephyr latest which includes the commit https://github.com/zephyrproject-rtos/zephyr/commit/4500862af13f4340a08c00e5cb078be2d5c6f39d but still the issue doesn't seem to work. More details about the issue that i am facing is here: https://lists.zephyrproject.org/g/users/topic/issue_flashing_and_running/83894298?p=,,,20,0,0,0::recentpostdate%2Fsticky,,,20,2,0,83894298

lmaciejonczyk commented 3 years ago

Use symlink to the device, i.e. : instead of: sudo ./build/posix/src/posix/ot-daemon -v 'spinel+hdlc+uart:///dev/ttyACM1?uart-baudrate=115200' run sth like: sudo ./build/posix/src/posix/ot-daemon -v 'spinel+hdlc+uart:///dev/serial/by-id/usb-Nordic_Semiconductor_ASA_Thread_Co-Processor_07AA4C22D2E2C88D-if00?uart-baudrate=115200' . Use the proper symlink of your device from /dev/serial/by-id/ when it appears after the device is enumerated in host OS. Don't forget to build OpenThread ot-daemon with an extra option: -DOT_SPINEL_RESET_CONNECTION=ON to support properly handling the USB connection after the device is hard reset.