Closed LaterBird closed 4 months ago
I came across a blog post describing that if a NAT device adopts a policy of discarding packets from unknown addresses, then there shouldn't be an issue with establishing a P2P connection. Here, "unknown addresses" refer to addresses where your own network has not initiated outbound communication. If the policy is not discarding but rather involves a blacklist mechanism, where the NAT device adds unknown addresses to a deny list upon receiving packets, then during P2P hole punching, packets sent to these unknown addresses will be remapped to a new port because they are already listed in the deny list. This situation essentially transforms the NAT into a symmetric NAT.
To handle this scenario, controlling the TTL (Time To Live) of packets ensures that mappings are first established on your own NAT device. Subsequently, using normal TTL values to send packets ensures successful hole punching.
Following this theory, I wrote my own testing code and indeed managed to establish connections. It seems that our libjuice does not handle this scenario, which resulted in the issue I observed during testing—only when A sends packets first can the connection be established. According to the theory above, if NAT_A employs a blacklist mechanism, this would explain why if B's packets arrive at NAT_A before A's, NAT_A adds B's public address to the blacklist. Then, when A subsequently sends packets to B's public address, they are mapped to a new port, preventing successful connection establishment.
Conclusion: Using TTL to first establish mappings within each respective NAT can avoid the situation I encountered. In my tests, I used a TTL value of 3 to establish mappings. Here is the link to the blog post where I found this information, specifically in item 11. https://rebootcat.com/2021/03/28/p2p_nat_traversal/
If the policy is not discarding but rather involves a blacklist mechanism, where the NAT device adds unknown addresses to a deny list upon receiving packets, then during P2P hole punching, packets sent to these unknown addresses will be remapped to a new port because they are already listed in the deny list. This situation essentially transforms the NAT into a symmetric NAT.
The core issue here seems to be that under some circumstances NAT A is endpoint-dependent (symmetric NAT), which means it is hard to hole-punch. This doesn't look like a blacklist to me, more like a DMZ setup interferring with the mapping for instance.
Then, when A subsequently sends packets to B's public address, they are mapped to a new port, preventing successful connection establishment.
This scenario is typical with endpoint-dependent NATs. It is handled by ICE with a peer reflexive candidate provided NAT B does endpoint-independent mapping and filtering (full cone or restricted cone NAT), so it can still connect in scenarios where NAT B is cooperative enough.
// Simultaneously send an initial packet to the peer to trigger NAT mapping const char *init_message = "Initial packet"; for (int i = 0; i < 5; ++i) { juice_send(agent, init_message, strlen(init_message)); sleep(1); }
I think you misunderstand how ICE works. Application messages are not used for NAT or firewall traversal. The library does everything for you, and you must wait for connected state to send messages. juice_send()
will always fail if the agent is not in connected/completed state (you don't check the return value here).
This doesn't look like a blacklist to me, more like a DMZ setup interferring with the mapping for instance.
My router's DMZ is turned off.
This scenario is typical with endpoint-dependent NATs. It is handled by ICE with a peer reflexive candidate provided NAT B does endpoint-independent mapping and filtering (full cone or restricted cone NAT), so it can still connect in scenarios where NAT B is cooperative enough.
The information detected by running STUN on the two hosts used for testing is as follows:
A host: stun stun.l.google.com:19302 STUN client version 0.97 Primary: Independent Mapping, Independent Filter, random port, will hairpin Return value is 0x000002
B host: stun stun.l.google.com:19302 STUN client version 0.96 Primary: Independent Mapping, Independent Filter, random port, will hairpin Return value is 0x000002
Based on the information I found online, it indicates that the NAT type for both hosts is Port Restricted Cone.
I think you misunderstand how ICE works. Application messages are not used for NAT or firewall traversal. The library does everything for you, and you must wait for connected state to send messages.
Thank you for pointing that out, I indeed misunderstood this part.
this is test log: test.log
Thank you for paying attention to my issue. @paullouisageneau
Thank you for the log. It matches what you observe but doesn't explain the behavior of NAT A.
The information detected by running STUN on the two hosts used for testing is as follows:
A host: stun stun.l.google.com:19302 STUN client version 0.97 Primary: Independent Mapping, Independent Filter, random port, will hairpin Return value is 0x000002
B host: stun stun.l.google.com:19302 STUN client version 0.96 Primary: Independent Mapping, Independent Filter, random port, will hairpin Return value is 0x000002
Based on the information I found online, it indicates that the NAT type for both hosts is Port Restricted Cone.
I assume you use this test client. If so, it looks unmaintained since last update was nearly 10 years ago now. The NAT test seems to assumes the STUN server supports NAT behavior discovery attributes like CHANGE-REQUEST and CHANGED-ADDRESS. These attributes were in the deprecated RFC 3489 but they have been removed in RFC 5389 and are now part of an extension (RFC 5780). In practice they are rarely supported because they require the server to have multiple IPv4 addresses. In particular, Google's STUN servers do not support them. I guess the client silently fails to run the proper test here so the result is probably meaningless.
I think I understand the principle of P2P UDP hole punching. Here is my testing environment:
My understanding of UDP hole punching is as follows:
I use the following test code and perform the following operations:
NAT_A is in city A, NAT_B is in city B, and both hosts detect their NATs as port-restricted cone NATs. Here is the test code:
What could be the reason for this? Can you help me identify where the possible issues might be?