Open luke-ingle opened 1 year ago
@luke-ingle When it's crashing, what's the value of watch
?
@luke-ingle Also, what does the backtrace look like? Who is calling StopWatchingSocket with what I assume is a garbage value (because we do check for null right before that line, so I assume watch
is not null).
@bzbarsky-apple I've attached the backtrace (I think. Haven't had a lot of experience using gdb before) It appears watch is not null. gdb_backtrace.txt
@luke-ingle Thank you, that is helpful. Just to make sure we are looking at the same things, what is line 266 of src/lib/dnssd/minimal_mdns/Server.cpp
for you?
No problems, this is what I have:
261 #if !CHIP_DEVICE_LAYER_NONE
262 chip::DeviceLayer::ChipDeviceEvent event{};
263 event.Type = chip::DeviceLayer::DeviceEventType::kDnssdInitialized;
264 chip::DeviceLayer::PlatformMgr().PostEventOrDie(&event);
265 #endif
266 mIsInitialized = true;
267 }
268 }
269
270 return autoShutdown.ReturnSuccess();
OK, thank you.
Does applying this change:
diff --git a/src/inet/UDPEndPointImplSockets.cpp b/src/inet/UDPEndPointImplSockets.cpp
index 5c9748d0cf..98190a0e4d 100644
--- a/src/inet/UDPEndPointImplSockets.cpp
+++ b/src/inet/UDPEndPointImplSockets.cpp
@@ -469,7 +469,13 @@ CHIP_ERROR UDPEndPointImplSockets::GetSocket(IPAddressType addressType)
{
return CHIP_ERROR_POSIX(errno);
}
- ReturnErrorOnFailure(static_cast<System::LayerSockets *>(&GetSystemLayer())->StartWatchingSocket(mSocket, &mWatch));
+ CHIP_ERROR err = static_cast<System::LayerSockets *>(&GetSystemLayer())->StartWatchingSocket(mSocket, &mWatch);
+ if (err != CHIP_NO_ERROR) {
+ // Our mWatch is not valid; make sure we never use it.
+ close(mSocket);
+ mSocket = kInvalidSocketFd;
+ return err;
+ }
mAddrType = addressType;
fix the crash for you?
Thanks for the diff, that change doesn't "fix" the crash as such, but at least it now gives a more informative error message.
...
1689580747.101217][12010:12010] CHIP:TS: Last Known Good Time: 2023-07-12T10:24:00
[1689580747.101464][12010:12010] CHIP:ZCL: Using ZAP configuration...
[1689580747.144773][12010:12010] CHIP:-: src/system/SystemLayerImplSelect.cpp:358: CHIP Error 0x000000C1: Endpoint pool full at examples/chip-tool/commands/common/CHIPCommand.cpp:128
[1689580747.144783][12010:12010] CHIP:TOO: Run command failure: src/system/SystemLayerImplSelect.cpp:358: CHIP Error 0x000000C1: Endpoint pool full
[1689580747.154974][12010:12010] CHIP:SPT: VerifyOrDie failure at src/lib/support/Pool.h:337: Allocated() == 0
Aborted (core dumped)
@luke-ingle Thank you for checking that!
Sorry for the lag; I might not be able to get to this until Monday. But it sounds like there are multiple issues here...
@luke-ingle So I have been trying to reproduce, but setting a low INET_CONFIG_NUM_UDP_ENDPOINTS (6, in this case) on Mac. I do get failed startup and asserts about it, as somewhat expected, but I do not get the specific "Allocated() == 0" assertion failure you get. Can you figure out what pool that is, perhaps? What's the stack to that VerifyOrDie failure at src/lib/support/Pool.h:337
bit?
For now https://github.com/project-chip/connectedhomeip/pull/28245 with the above diff to at least not end up with an invalid memory reference.
Reproduction steps
out/host/chip-tool pairing ble-thread 101 hex:0e080000000000010000000300001635060004001fffe002088a4dee6232a203350708fd852c12168e7729051013c250b2ab003e9261ca98546f47739c030f4f70656e5468726561642d623630360102b6060410f2443479159316b89b4dbf67be0e7b230c0402a0f7f8 16544735 1852 --paa-trust-store-path credentials/production/paa-root-certs/
gdb output when hit seg fault: chip-tool-segfault.txt
Bug prevalence
Every time I have too many network interfaces
GitHub hash of the SDK that was being used
aaa25420f02d353c74677bef6e79abbd46eea463
Platform
other
Platform Version(s)
No response
Anything else?
I was previously working on Android Cuttlefish, and the emulator gave me a whole bunch of extra network interfaces.
I also work remotely so I have to use my company VPN.
If I reboot my laptop (after building chip-tool) and run chip-tool, it will work as expected. If I then log on to my VPN and try to run chip-tool, I get a segfault error. If I then disconnect from the VPN, and try again, it will work fine. I then tried removing the majority of the network interfaces, and just leaving my laptop's built in Network card, docker and company VPN. I then connected to the VPN, and ran chip-tool, and it ran fine.
So I am assuming there's an error around the network and finding a valid network connection if a user has multiple options?
List if interfaces are included in attached file list_of_network_interfaces.txt