icecc / icecream

Distributed compiler with a central scheduler to share build load
GNU General Public License v2.0
1.6k stars 252 forks source link

Restrict usable network interfaces - specify by address vs. by interface name #489

Closed mp-chet closed 5 years ago

mp-chet commented 5 years ago

I'd like to start a follow-up discussion to #427, which has been solved through the introduction of a new command-line option --listen <address>.

Funny story: at the same time you were pushing 5e275a5 into master, I was working on an alternative approach. 😃 The difference between the upstream solution and my attempt is that the former lets the user specify an IP address whereas the latter is based on the network-interface name. I made that decision because I consider the name of an interface to be more stable and persistent than the IP address associated with that interface.

Here's the current state of my work (which is still in progress at the moment): mp-chet/icecream@0cd62bc

I think both approaches have their advantages and disadvantages and would be able to complement each other. One could incorporate both and make them mutually exclusive - so the user would be able to specify either --listen or --interface but not both.

Are you interested in such an enhancement?

llunak commented 5 years ago

I implemented --listen because after a quick search I didn't find out how to restrict bind() to an interface, so I just took the approach from distcc. If you submit your version, I'll probably just revert --listen, so I am interested.

A couple of comments on the linked changes:

mp-chet commented 5 years ago

Thanks for your feedback! I'll try to get it into shape then.

  • I think it's better to say e.g. "network interface" rather than "bind interface" in the user-visible strings, as bind is really an implementation detail.

Agreed.

  • Why do you create two new files just for one function, can you simply make that part of getifaddrs.* ? IMO it even logically belongs there.

I would prefer keeping these two things separate. The getifaddrs.* module provides an optional drop-in replacement of the getifaddrs() libc function (+ friends). If that functionality is present on the system - checked with HAVE_IFADDRS_H - the header defines only three macro aliases and the .cpp file is effectively empty after preprocessing. At a minimum, this is the case when building for Linux.

Given that, I think it's a good idea to keep having only optional functionality reside in that module, and to put my new utility function into a separate translation unit.

(As a side note, the code inside getifaddrs.cpp looks somewhat fishy to me. I haven't examined it in full detail, but the copying between struct sockaddr variables via assignment operator is something that will land on one's feet if IPv6 ever comes into play - a struct sockaddr_in6 is generally larger than a struct sockaddr, so information will be lost.)

llunak commented 5 years ago

I'm generally not a fun of having many way too small files. Granted some of the sources in services/ are tiny, but they are logically separated things. But this is basically two functions that both deal with network interfaces, yours even simply makes the data from the existing one easier to use, so they IMO go together regardless of whether one is optional. Would renaming the source file to something better like netiface.* help?

As for IPv6, when somebody adds supports for that, all the network handling will need to be reviewed for issues like that.

mp-chet commented 5 years ago

All right, I've updated my branch. It's still a work in progress that I intend to test properly before coming forward with a pull request.

llunak commented 5 years ago

Do you have an estimate for when your patch will be ready? I consider the codebase more or less ready for the 1.3 release and I'd like to include this.

mp-chet commented 5 years ago

Unfortunately I haven't gotten around to doing more thorough testing yet, but the code itself should be complete. I've opened #495 as a preliminary pull request, so if you can spare some time, feel free to give it a try!

mp-chet commented 5 years ago

495 is ready to be merged after review.

One thing that jumped at me during testing is that when one binds the socket to a specific interface, it will no longer accept connections from localhost. This may or may not be an issue.

Possible future extensions to solve this:

Both approaches would require opening multiple sockets and multiplexing them with select().

llunak commented 5 years ago

Client normally connects to the daemon using a unix socket, a network socket is used only if that unix socket is not available (e.g. when building in a chroot). So it's not a critical problem, although it'd be nice to handle that (feel free to, or I'll do it somewhen).

mp-chet commented 5 years ago

It's not an issue for client-to-daemon communication, but it may affect daemon-to-scheduler communication when daemon and scheduler run on the same host.

llunak commented 5 years ago

Scheduler should always refer to daemons using their hostnames and the non-localhost IPs, 127.0.0.1 is used only in messages to mean "build locally", so this shouldn't be a problem.

llunak commented 5 years ago

And #498 should take care of the loopback, so that should be all here. Thank you again for the patch.