openthread / ot-br-posix

OpenThread Border Router, a Thread border router for POSIX-based platforms.
https://openthread.io/
BSD 3-Clause "New" or "Revised" License
400 stars 225 forks source link

OpenWrt build failing with NAT64 enabled #2033

Open jwhui opened 11 months ago

jwhui commented 11 months ago

The ot-br-posix/pull/2011 mentioned in openwrt/pull/11559 was committed on Dec 19, 2022, and openwrt#L46 depends on The openwrt sdk version is 22.03.3, which does not merge into the above patch. I think even if it wasn't for this issue, we would need to update the openwrt sdk version, e.g. the latest master to ensure that the two are always compatible.

Nonetheless, I verified locally that compiling with the default 22.03.3 version passes, and switching to master also compiles fine.

I'm not sure if switching to the new openwrt sdk will fix this, or if re-triggering the action run will (I've had CI fail on commits in unrelated code, and a re-run fixed it). I am testing online by triggering the action after a fork.

Originally posted by @ihidchaos in https://github.com/openthread/ot-br-posix/issues/2030#issuecomment-1730972956

jwhui commented 11 months ago

I'm not sure if switching to the new openwrt sdk will fix this, or if re-triggering the action run will (I've had CI fail on commits in unrelated code, and a re-run fixed it).

@ihidchaos compiler errors are deterministic, so re-running the check will not fix it.

ihidchaos commented 11 months ago

I'm not sure if switching to the new openwrt sdk will fix this, or if re-triggering the action run will (I've had CI fail on commits in unrelated code, and a re-run fixed it).

@ihidchaos compiler errors are deterministic, so re-running the check will not fix it.

Thanks for the tip, I'm trying to fix it. The problem I'm having is that this error is hard to reproduce when I manually test it locally, so I'm trying to figure out how to verify it using GitHub Action. guoyuchao-vm_2023-09-22_16-18-10.log

ihidchaos commented 11 months ago

UPDATE: I realized that the getaddrinfo_a, gaicb don't actually exist in the musl libc. The libanl mentioned in openwrt/pull/11559 only happens with the glibc library, not the musl libc.

According to my tests, disabling OPENTHREAD_POSIX_CONFIG_NAT64_AIL_PREFIX_ENABLE doesn't affect the device getting nat64 prefix on OpenWrt platform.

I guess we can withdraw openthread/pull/9441 for now and turn it back on after using the normal synchronous mode implementation of getaddrinfo, or we need to replace the current getaddrinfo_a with c-ares.

ihidchaos commented 11 months ago

I have written code to use c-ares instead of getaddrinfo_a and am testing it on my own fork branch.

image

Action shows compilation results that occasionally fail for unknown reasons, which can be resolved by simply retriggering.

However, I'm wondering if introducing c-ares in order to use asynchronous address lookups is necessary.

xuyirio commented 11 months ago

There is some platform already adopted the version of using libanl. I'm wondering if we want to keep backward compatibility when switching to c-ares. It might take some time to verify.

As a mitigation of the otbr-posix build CI error, @ihidchaos could you help revert the change introduced by openthread/pull/9441? It can be changed back again after the solution is finalized.