Allow other processes to bind to the STUN UDP port.

spencerlambert commented 7 years ago

This change sets the SO_REUSEADDR option for UDP connections, something already set for TCP connection; allowing outside processes to bind to the same UDP STUN port, letting other applications send UDP messages that traverse NAT.

The specific use case I'm up against is allowing a TR-069 ACS server to send messages to devices connected to the STUN server. Part of the TR-069 standard is a UDP Connection Request message, notifying the end device to check in for pending tasks. Without the SO_REUSEADDR option, the ACS cannot bind to the correct port and the UDP Connection Request message is never received by the end device.

jselbie commented 7 years ago

Hi Spencer,

I'm trying to understand this change and the implications.

Are you saying that there's a scenario where both the STUN server listening on port 3478 needs to share that same port with the ACS-server? Why does the ACS-server need to listen on port 3478 if there's already a STUN server? If you have two UDP services listening on the same port on the same box, incoming packets for that port will go to one server program, but not the other. It's not deterministic.

Or is it the case that the ACS server just needs to send from port 3478 so as to take advantage of the port mapping the device already has with its NAT?

spencerlambert commented 7 years ago

Or is it the case that the ACS server just needs to send from port 3478 so as to take advantage of the port mapping the device already has with its NAT?

Yes, it's this. The ACS doesn't need to listen on port 3478, but it does need to originate a message from the port. In order to transmit the message on port 3478, the socket needs to bind to that port. The ACS opens and closes the port long enough to transmit three copies of a 170 byte message.

As an example, this is how another STUN server does it. This server is compatible with the ACS, like it is.

https://github.com/coturn/coturn/blob/114881bc9a455b8694634dd4cc3c3acb22845826/src/apps/common/apputils.c#L158-L171

jselbie commented 7 years ago

Got it. I'd like to take this change, but have this mode off by default. Would you like to go all the way with this:

Add a new bool member to CStunServerConfig called "reuseaddr" or something similar.
Add a corresponding command line parameter (e.g. "--reuseaddr") that would explicitly set the new member field of CStunServerConfig to true. Command line parsing is what main.cpp is all about.
Modify CStunServer::Initialize to pass the config flag as a new parameter to the four separate calls to AddSocket. Then AddSocket would pass the flag to CStunSocket::UDPInit.
Update markdown file to document new parameter. (Then "make textres" and "make manpages")

Thanks!

spencerlambert commented 7 years ago

Super! Yes, I can make this an option, following your guide. I'll reach out when it's ready for next steps.

spencerlambert commented 7 years ago

Please check this again. I've added and documented the --reuseaddr command line option. UDPInit() was being called in the client and tests code, so I hard-coded a false, keeping these areas working as they were previously.

I tested the STUN server in my staging TR-069 environment. When --reuseaddr was used, my ACS was able to share port 3478. When the switch was not used, my ACS got a system error when trying to bind to port 3478.

jselbie commented 7 years ago

Looks good. If you can address the issues above (initialize fReuseAddr in constructor and markdown comments), we should be good to go.

spencerlambert commented 7 years ago

Thanks for your feedback. The two changes you pointed out have been implemented, namely initializing fReuseAddr and updating the help text.

spencerlambert commented 7 years ago

Please hold off on merging this code. I'm finding that I'm not getting the wanted behavior from the STUN and ACS server. When the ACS uses the port, it's blocking the STUN server momentarily. I may need to use something like IPC or iptables to get two processes to send over the same port.

jselbie commented 7 years ago

What do you mean by "it's blocking the STUN server momentarily"? If you can elaborate more on how the ACS server works, I might be able to assist.

spencerlambert commented 7 years ago

This document explains how the TR-069 ACS Server utilizes STUN.

https://www.broadband-forum.org/technical/download/TR-111.pdf

Page 25 shows how STUN and the ACS use the same address and port to talk to a device behind NAT.

Figure 1-6 are stuff the STUN Server does.

Figure 7 is the ACS making a Connection Request to the device.

I'm using Genieacs, which runs on Node JS. It's using the dgram Node module for the UDP socket. The reuseAddr flag is also being set.

https://nodejs.org/api/dgram.html#dgram_dgram_createsocket_options_callback

client = dgram.createSocket({type:'udp4', reuseAddr:true}).bind({port:3478});

I don't get any errors when sending messages using the dgram client in Node, and the device does get the UDP Connection Request. If I leave the client open, the STUN server stops getting bind requests from the client device behind NAT. If I close the Node dgram client, the STUN server starts getting messages after about 3-4 seconds.

It seems that both processes cannot use the same port at the same time, something I thought was possible using the SO_REUSEADDR. I did test the SO_REUSEPORT option also, but Node JS seems to not use this option, as it cannot connect to port 3478 while the STUN server is running, even with the reuseAddr set to true.

I'm thinking on two alternative methods. 1) Use an IPC call, getting the STUN server to transmit the message on behalf of Node. 2) Build a STUN server in Node, so it can all run within the same process.

jselbie commented 7 years ago

I just downloaded GenieACS from https://github.com/zaidka/genieacs. But I don't see any reference to STUN, port 3478, etc... Where in the code does it create the socket?

spencerlambert commented 7 years ago

I've got a pull request that is adding the bind to port 3478. It's happening here: https://github.com/spencerlambert/genieacs/blob/master/lib/api-functions.coffee#L47-L62

spencerlambert commented 7 years ago

I did some looking into how a Commercial ACS has things setup between the ACS and STUN server. It looks like the STUN server sets up a relay port, allowing the ACS to tell the STUN server to send a message to the NAT'ed device.

https://github.com/zaidka/genieacs/pull/217#issuecomment-289509011

jselbie commented 7 years ago

In your genieacs code, you aren't reliably closing the socket after you send. Garbage collection should clean the socket up eventually. But wouldn't it be better to create, bind, send, then close?

jselbie commented 7 years ago

By chance are you using "--full" mode or specifying "--primaryinterface" in the command line args to the stun server?

It's possible that node.js is always doing a select/recvfrom call on the socket - thus consuming incoming packets.

Two ideas.

There are several pen source STUN servers written for Node. You could easily slam that code into your GenieACS script and then hack some way to share the socket. https://github.com/search?utf8=%E2%9C%93&q=node+stun&type=
Rather than having GenieACS send directly to the client device, send a STUN indication message from GenieACS to the STUN server. The STUN indication message has a custom attribute for the device's ip and port address as well as the payload you want it to receive. Modify the stun server code to receive the STUN indication message and to read this attribute payload. Then it sends the payload directly to the device from port 3478. You'd probably just have to modify the code in stuncore/messagehandler.cpp to treat the normal binding request differently from the indication message. And you have to find a way to make it secure so that the stun server can't be used as a udp relay attach surface. (e.g. only packets from localhost or a fixed set of IPs are allowed to send indication messages). But I think this is pretty straightforward.

spencerlambert commented 7 years ago

In your genieacs code, you aren't reliably closing the socket after you send. Garbage collection should clean the socket up eventually. But wouldn't it be better to create, bind, send, then close?

It gets closed here: https://github.com/spencerlambert/genieacs/blob/master/lib/api-functions.coffee#L59

jselbie commented 7 years ago

Oh I see. I thought it was javascript, but it's really coffeescript. I guess coffeescript uses indentation instead of curly braces to start a new block. That's what was throwing me off.

What do you think of idea #1 or #2 above?

spencerlambert commented 7 years ago

Thanks for pointing out the STUN indication message and custom attributes. Right now, option 2 is most attractive, as it may scale the best as the ACS and STUN server can be physically separate.

I'm not using --full or --primaryinterface. It's likely that I will in production, but now I'm using the easy config.

jselbie commented 7 years ago

Cool. I double checked. You just need to modify the code in stunreader.cpp/h to parse your new attribute type (you decide the schema) and then modify messagehandler.cpp to forward the data to the client device IP and port. Don't forget about security.

spencerlambert commented 7 years ago

Thanks! Is this a modification that is useful in the main branch or should I stick with a fork, when I get it working?

jselbie commented 7 years ago

I'll take the pull request if it's done right. I didn't know too much about TR-069 and ACS until I heard about what you were trying to do. I read up on the problem space and the "annex g" thing. I think the original designers of the spec messed some things up, but it is the standard.

If you get the needed changes into both Genie-ACS and STUNTMAN, it will be win-win. Feel free to start a mail thread with me and Zaid if you want to talk about broader design so we can all make sure it's the right thing end to end. (My email is jselbie at g mail dot com). I'm still wondering why the so_reuseaddr flag didn't work. I'd like to figure out why that didn't work when I get some free time later this week - as that might save you some time. Otherwise, if node.js doesn't play nice with socket sharing, I like the forwarding approach.

spencerlambert commented 7 years ago

I just finished a packet capture on my TR-069 device. It's looking like the port sharing works. What first set me on the path thinking the port sharing wasn't working, was because when sending request close together my TR-069 device would only respond to the first UDP Connection Request, requiring a 5-10 second wait between successful UDP Connection Requests.

This packet capture shows that the NAT'ed device is getting all the UDP Connection Requests while doing STUN binding.

https://github.com/zaidka/genieacs/pull/217#issuecomment-289943495

Right now, it's looking like the SO_REUSEADDR works. When I get the problem solved causing the ignored UDP Connection Request when going rapid fire, I'll let you know and we can work on completing this pull request. Thanks for your help and input. It's been very good.

spencerlambert commented 7 years ago

I was able to reproduce the ignored UDP Connection Requests using a commercial ACS that send the requests via a relay feature of the STUN server. The issues I thought was related to a port conflict between Node JS and STUN was simply a behavior of the TR-069 device.

The commercial ACS takes the approach of sending a UDP Connection Request every 30-60 seconds while there are pending tasks. This seems to be a smart approach as it will keep tasks moving along regardless of the reason UDP Connection Requests are not taking affect. My goal is to get something similar into GenieACS.

I'd like to proceed with this pull request as is. What are your thoughts?

jselbie commented 7 years ago

Glad you got it figured out. Let me do some sanity testing on the pull request and and I'll merge it in when done. Might be a few days, but I don't anticipate any problems. Thanks!

jselbie commented 5 years ago

@spencerlambert - hey there. Two things.

I've got a new branch, development, which is intended to be the working branch for some "version 1.3" work I'm doing. One of the features I'm adding is multi-threading where the server can have N threads, each with a dedicated socket listening on the same port as the others. As a side effect of this change, I've touched the --reuseaddr code path to create sockets with SO_REUSEPORT instead of SO_REUSEADDR. I think it should still work with your port-sharing scenario. Could I convince you to validate this change with your ACS?

Also, someone on the StunProtocol.org mailing list (google group) was asking about TR69, ACS, and STUN co-existence. I wanted to introduce him to you to see if you could help resolve his issue. But didn't have your email address. Could you jump in on this discussion (or email jselbie at gmail and we'll do a private thread).

Thanks and happy thanksgiving.

spencerlambert commented 5 years ago

I'd be happy to test this and post the results.

jselbie / stunserver

Allow other processes to bind to the STUN UDP port. #20