rethinkdb / rethinkdb

The open-source database for the realtime web.
https://rethinkdb.com
Other
26.69k stars 1.86k forks source link

Listen on Unix Domain Sockets #1884

Open AtnNn opened 10 years ago

AtnNn commented 10 years ago

I propose allowing to specify files as the argument to the --driver-port and --cluster-port options:

rethinkdb --cluster-port file:/tmp/rethinkdb.sock

The server could also listen by default on rethinkdb_data/driver.sock and rethinkdb_data/cluster.sock.

The Python driver and rethinkdb admin would also know how to connect to unix domain sockets:

rethinkdb admin --join file:rethinkdb_data/cluster.sock
r.connect(file="rethinkdb_data/driver.sock")

And when given a folder, the admin tool and the driver would look for cluster.sock or driver.sock respectively:

rethinkdb admin --join file:rethinkdb_data

This could make some administrative tasks easier and make a lot of tests easier to write.

wojons commented 10 years ago

I think if this was supported at least for the driver port it could help a lot on performance on creating new connections. would be cool if this happened before the lts

larkost commented 9 years ago

At this point we are well taken care of for testing, so it would not have much impact there. However, it could be a meaningful speedup for local worker connections.

Pinging @danielmewes to see if he either wants to kill it or prioritize it.

danielmewes commented 9 years ago

Actually neither of those. I'd like to keep it in backlog as a performance optimization we probably want to look into at some point.

wojons commented 9 years ago

@danielmewes this is something that should be pretty low impact unless the code that opens the network connection is custom built. Most 3rd party stuff include making unix sockets pretty easy. And just needs to be parsed at start up if it needs to open a tcp connection or a socket.

From what i remeber about using unix sockets over ipv4 you use AF_UNIX insteed of AF_INET and insteed of a ip address a path to a socket file that you want to be created.

leoluk commented 9 years ago

It's a security feature, too - it prevents applications running as another user from accessing the database.

danielmewes commented 9 years ago

That's a great point @leoluk. Is this something that's actually relevant in your environment?

joaojeronimo commented 9 years ago

:+1: I think this is cool for people that have proxies like stunnel in front of rethinkdb.

jab commented 8 years ago

Just started experimenting with RethinkDB (thanks for the great work!) and was surprised it doesn't have this. Seemed to come standard in other data stores I've used (Redis, MongoDB, MySQL, etc.). Would the patch for this be new-contributor friendly? And (since I see this is marked "backlog") would it get reviewed/merged any time soon? Thanks again for the great work on RethinkDB.

danielmewes commented 8 years ago

@jab If we got a pull request for this, I think we would merge it fairly quickly (for either RethinkDB 2.2 or 2.3). It would be useful for https://github.com/rethinkdb/rethinkdb/issues/4785 as well, which we want to ship with RethinkDB 2.3.

danielmewes commented 8 years ago

@jab Also: if you're already familiar with C++ and sockets, I think this change shouldn't be too hard. :-)

cristicbz commented 7 years ago

A use-case which we ran into is running rethinkdb proxy as a Kubernetes DaemonSet. The only convenient way for a container to talk to the daemon running on the same machine is via a Unix socket mounted as a Docker volume.

In general though, talking to RethinkDB proxies via unix sockets seems like a good fit.

martinvahi commented 7 years ago

Well, the person, who started this thread here, pointed me here from rethinkdb.slack.com.

Basically, I was asking for the same feature and in the background description of my feature request I described 2 reasons, may be use cases, for wanting the feature:

A slimy sales-person argument for the current feature request might be that this feature allows the RethinkDB to be used at new markets, which require an efficient way to write multi-programming-language applications, and You do want to be the first one there, specially given that You already have practically a ready product available. It also allows "enterprise software development"(cling-cling, dollar signs for eyes) by allowing legacy applications, written in legacy technology like Delphi and Microsoft Foundation Classes and the various old C/C++ apps, to be GRADUALLY INTEGRATED (read: easy sales, "agile", results can be shown quickly) with modern technology, without requiring a fast, all-at-once, rewrite of the application.

Slightly edited copy-pasted text of my original post from the forum:

Given that specialists from different disciplines tend to use different programming languages, the best-in-class libraries are scattered between different programming languages. Therefore, to use only the best components available, one has to use multiple programming languages in a single application. Sometimes the multi-programming-language requirements comes in from the speed-versus-developer-comfort scale. In the case of embedded systems the example is the inline assembler in C source. In applications programming domain it's the C/C++ based modules of dynamic programming languages like Ruby, PHP, Python. In the case of web applications the multi-programming-language requirement, specially at the server side, has more social reasons than technical reasons, id est the whole NodeJS part exists due to the small mishap at Netscape, which created JavaScript and by the nature of the application, the web browser, the JavaScript had to be supported by competing vendors, even the then- all-mighty Mirosoft with its Internet Explorer. That made the JavaScript the only programming language that ACTUALLY RUNS on all web browsers and the lessening of the duplication of work seems to be the only excuse for the existence of the NodeJS. But non- business software, specially at the more techical side, including scientific software, has the specialists related reasons for the diversity of programming languages. Sometimes the excuse might be even domain specific programming languages and the fact that it's more optimal for mathematicians to ignore the existence of the time axes, which allows them to be more productive in a time-axis-free programming language like the functional programming languages, specially Haskell, are. ( The lack of side-effects is just a technical peculiarity.) So, in an effort to shorten my comment here: I need a message bus that can be used with multiple programming languages and is asynchronous, is able to store messages till the endpoints of the messages reboot, re-establish connections, etc.
and the RethinkDB with its observer-design-pattern-based query model seems to be PERFECT for that. I want my application's instance of the RethinkDB process to be available to only the other operating system processes that are part of that given application instance. I also want to make sure that no other operating system user can access the RethinkDB instance. The RethingDB instance should be startable/stoppable like any other operating system process and a single operating system user should be able to run multiple instances of RethinkDB without any global configuration and without editing any settings anywhere telling, how many of those RethinkDB instances can be run in parallel. So, my question is, could someone please tell/write me, how to make sure that RethinkDB has all its network ports closed and communicates only through Linux named pipes (read: temporary file like things that are owned/created-and- deleted by a non-root user)?

Thank You.

andrew-azarov commented 3 years ago

That's a great point @leoluk. Is this something that's actually relevant in your environment?

That's relevant in any environment if you want to keep system secure. Besides it saves the socket memory and connection buffers thus saving resources