uber-node / ringpop-node

Scalable, fault-tolerant application-layer sharding for Node.js applications
http://uber.github.io/ringpop/
MIT License
1.18k stars 146 forks source link

Make ringpop installable and run on node v4 #290

Closed thanodnl closed 7 years ago

thanodnl commented 8 years ago

followup on #289

thanodnl commented 8 years ago

@Raynos we needed to swap out uber-hammock for hammock because uber-hammock failed on a unit test. Since you added changes to uber-hammock after it being forked you might have best knowledge if it is safe to move back to the original module.

Both integration tests and unit tests pass now on both node 0.10.x and node v4.4.7. If you think it is safe to swap uber-hammock for hammock we will run an isolated test with one of our services on this version to validate everything keeps working.

Raynos commented 8 years ago

so the hammock component is used for handleOrProxy, aka proxying.

I read the code of hammock & uber-hammock, looks like a safe change to me.

thanodnl commented 8 years ago

Thanks for looking into the code of hammock and you are right about where it is used in ringpop. We will run an experiment in a ringpop system that uses this code path to validate the correct workings and proceed with this PR.

thanodnl commented 8 years ago

@sashahilton00 We have ran tests on our internal systems and found two interesting things.

The good news is that the upgraded package of hammock did work fine with forwarding requests.

The bad news is that farmhash caused a breaking change. By default ringpop uses the hash32 function of farmhash which causes different checksums to be calculated in some cases (we see a correlation with the amount of ringpop nodes participating). The problem that is caused by checksum inconsistencies during upgrading is twofold.

At first a difference in checksums between two nodes will cause a fullsync of its membership. This has not a huge impact on stability of the system, however it will increase the load on the network during a rolling upgrade.

The second effect is that ringpop can be configured to enforce hashring consistency (measured via a farmhash checksum) during its forwarding path and return an error if the rings are not consistent. This is a backwards incompatible change which we are not ready to make at this moment.

We need to spend some time on designing and testing a workable non-breaking upgrade path for this before we can cut a new release that is compatible with node 4.4.7. Until we have a solution you can use your own branch or this branch in your npm file via a git url as your dependency.

/cc @mennopruijssers @Raynos

sashahilton00 commented 8 years ago

@thanodnl thanks for the update. I did suspect that it might not all be plain sailing. Fear not, I will use my own branch until an upgrade path for you guys has been devised; unfortunately I cannot offer much by way of assistance in implementing such an upgrade path, as I am currently using ringpop from scratch, so have no legacy systems that require the upgrade, but more to the point I am not proficient enough with the innerworkings of farmhash/ringpop to be able to commit changes that aren't realistically going to take a bunch of reworking from one of the actual ringpop engineers. Thus I will use my branch for now, and keep an eye on uber/ringpop-node for the eventual upgrade. If you remember, could you cc me once ringpop is compatible with 4.4.7. Sorry I cannot be of more assistance at this time, but until then, so long, and thanks for all the fish :)

thanodnl commented 8 years ago

@sashahilton00 if you are just starting make sure to set the isCrossPlatform to true in options to switch farmhash.hash32 to farmhash.fingerprint32 (code). That will prevent the problems described here :). We will make sure to update the documentation with regards to this.

example:

var ringpop = new Ringpop({
  isCrossPlatform: true
});
sashahilton00 commented 8 years ago

Will do, thanks.

Raynos commented 8 years ago

@thanodnl

It sounds like the safest thing to do is look at https://github.com/lovell/farmhash/tree/v0.2.2

We need to:

Basically, if we rebase the farmhash master branch and remove that single commit we should be able to create a farmhash 0.2.x build that has node4 compat.

lovell commented 8 years ago

Hello, perhaps the farmhash module should provide/expose both versions of the hash32 function so ringpop can explicitly call the original v1 implementation and everyone else by default uses the current v1.1 implementation.

Raynos commented 8 years ago

@lovell that would be pretty cool :)

Raynos commented 8 years ago

I just created a branch / tag of farmhash using the older version ( https://github.com/Raynos/farmhash/tree/v0.2.2-node4-compat ) but still having the NaN & node4 support.

lovell commented 8 years ago

The latest farmhash v1.2.0 published to npm bundles both FarmHash implementations, v1.0.0 ("legacy") and v1.1.0 ("current").

A new hash32v1 function provdes explicit access to the v1 implementation for ringpop. All other functions continue to use the current implementation and are ~5-10% faster compared with legacy, depending mostly on which SIMD intrinsics (e.g. SSE4.2, AVX) are available.

Enjoy!

sashahilton00 commented 8 years ago

Also, I guess tchannel will need to be updated, as it is using the hash32 function of farmhash@1.1.0 at the moment (https://github.com/uber/tchannel-node/blob/66b8d42f07986fbae2b76dc73778ba070223d417/v1/frame.js#L82), could this be the cause of the checksum inconsistencies you were seeing when testing this in production @thanodnl?

Raynos commented 8 years ago

@sashahilton00 the v1 protocol is unused at the moment.

Connection statically imports v2 protocol ( https://github.com/uber/tchannel-node/blob/master/connection.js#L33 ). We really should delete the v1 folder.

sashahilton00 commented 8 years ago

@Raynos ah right, that makes things easier then. PR #295 is just a fix to use the hash32v1 function to solve the problem @thanodnl mentioned with hash inconsistencies. On a side note, I have been looking into using Ringpop with Docker, and one of the issues I have encountered is that Docker does not expose it's external port to the node apps, thus advertising on Hyperbahn/Ringpop is somewhat challenging. I assume you guys have found a way to grab the randomly assigned port so that you can dynamically spin up/down where necessary?

motiejus commented 8 years ago

@sashahilton00 you can make ringpop listen on a static port. You will need to pass a [[https://github.com/uber/tchannel-node/blob/v3.6.9/channel.js#L610 | tchannel instance]] with a predefined port [[https://github.com/uber/ringpop-node/blob/v10.15.0/index.js#L102 | to ringpop]].

Does that clarify it?

sashahilton00 commented 8 years ago

@Motiejus yeah, that's what I have been doing at the moment, though it does mean that spinning up/down instances cannot be done in a completely automated manner (when running multiple instances on one machine) as one would have to change the static port in each instance of the node app and then pass the -p flag when starting docker. Starting a TChannel on a static port isn't a problem as docker always maps the random external port to the port the app is listening on, the problem is that the ringpop hostPort will change on every restart due to docker choosing a different external port each time, thus ringpop surely needs to know what the external port is. Thanks.

motiejus commented 8 years ago

If I understand correctly what you are trying to achieve, you can have the same static port for all the containers, but the containers themselves can expose a variable port. Let's take an example with nginx:

docker@default:~$ docker run -ti --rm -p 80 nginx:latest
docker@default:~$ docker run -ti --rm -p 80 nginx:latest

See the two containers:

docker@default:~$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                            NAMES
5f8c245fe0ab        nginx:latest        "nginx -g 'daemon off"   About a minute ago   Up About a minute   443/tcp, 0.0.0.0:32773->80/tcp   fervent_booth
676d4a38bd25        nginx:latest        "nginx -g 'daemon off"   About a minute ago   Up About a minute   443/tcp, 0.0.0.0:32772->80/tcp   boring_bassi

Try to ping them:

docker@default:~$ curl  -s localhost:32772 | head -1
<!DOCTYPE html>
docker@default:~$ curl  -s localhost:32773 | head -1
<!DOCTYPE html>

As we can see, we can start as many instances of ringpop on a local node as we want, all listening to the static ports inside the container. Docker will do the right thing in creating ephemeral ports and mapping them correctly inside.

Does that make sense?

sashahilton00 commented 8 years ago

@Motiejus Yes that makes complete sense, so how does an external service running on server A call a service running in docker on server B over Ringpop/Hyperbahn given that server A has advertised the service on the internal static port, yet docker has mapped the internal static port to an unknown (to the node.js app) ephemeral port which has not been advertised on Ringpop/Hyperbahn? AKA. How can, in the example above, nginx know that the ephemeral port that it is accessible over is 32773 and thus advertise this on the Ringpop/Hyperbahn cluster?

motiejus commented 8 years ago

Oh, I see your point now. Sorry, I don't know. For that, I would encourage to have a chat with tchannel/hyperbahn folks, as they have more knowledge on how RPC works (opening an issue in https://github.com/uber/hyperbahn would be a good start).

Let's try to keep this thread about node v4 support.

sashahilton00 commented 8 years ago

Yes let's, it was only a side note :) Will open an issue on the ringpop repo.

Raynos commented 8 years ago

@sashahilton00

The way we run docker in production is we share the network with the physical host. AKA if you listen on ip:port then you listen on ip:port.

ringpop also supports non-static ports, you can call listen(0) and rely on the OS to assign a port then use channel.address() to get the port. The way you would bootstrap a "dynamic" ringpop would be to use the "hyperbahn" bootstrap feature.

The "hyperbahn" bootstrap feature is actually implemented in some of our internal apps, I copied an example into a gist ( https://gist.github.com/Raynos/6c4552d00d1a3da30634377ae02dde07 ).

Basically:

sashahilton00 commented 8 years ago

Thanks @Raynos. Also, on a side note, at some point ringpop broke the example code in the docs, as running it with ringpop v0.10.8 works as expected, but ringpop v10.15.0 throws an AssertionError: TChannel must listen on top channel due to the line ringpop.channel.listen(ports[index], host, function onListen() {. The fix seems to be to amend it to tchannel.listen(port, host, function onListen() {, I assume that this should be fine? Tested in both node v0.10.46 and v4.4.7 using PR #295, though @thanodnl or someone else will have to rerun some tests on that to make sure the hashring inconsistencies are fixed on your side.

sashahilton00 commented 8 years ago

Also, would someone ind enlightening me as to how one accesses the POST data from a request that has been handleOrProxy()'d by ringpop? I can see it in the _writeableState of the req paramenter, and was wondering if there is an easy way to access it aside from res._writeableState.etc? Thanks.

sashahilton00 commented 8 years ago

Ah, just re-read what Nils wrote and saw that farmhash was not the only issue that arises from moving to v4 and is not the solution to the hashring problem. Disregard previous comments referencing PR #295 as the solution, as all it does is solve the checksum problem, which Nils noted was not much of a problem. Will leave this one to you guys and just work from my branch as I have no need of the non-breaking upgrade path Nils mentions. Thanks for the assistance thus far anyway.

sashahilton00 commented 8 years ago

Dealt with by #304. Should be closed. /cc @thanodnl