Closed thanodnl closed 7 years ago
@Raynos we needed to swap out uber-hammock
for hammock
because uber-hammock
failed on a unit test. Since you added changes to uber-hammock
after it being forked you might have best knowledge if it is safe to move back to the original module.
Both integration tests and unit tests pass now on both node 0.10.x
and node v4.4.7
. If you think it is safe to swap uber-hammock
for hammock
we will run an isolated test with one of our services on this version to validate everything keeps working.
so the hammock
component is used for handleOrProxy, aka proxying.
I read the code of hammock
& uber-hammock
, looks like a safe change to me.
Thanks for looking into the code of hammock
and you are right about where it is used in ringpop
. We will run an experiment in a ringpop system that uses this code path to validate the correct workings and proceed with this PR.
@sashahilton00 We have ran tests on our internal systems and found two interesting things.
The good news is that the upgraded package of hammock
did work fine with forwarding requests.
The bad news is that farmhash
caused a breaking change. By default ringpop
uses the hash32
function of farmhash
which causes different checksums to be calculated in some cases (we see a correlation with the amount of ringpop nodes participating). The problem that is caused by checksum inconsistencies during upgrading is twofold.
At first a difference in checksums between two nodes will cause a fullsync of its membership. This has not a huge impact on stability of the system, however it will increase the load on the network during a rolling upgrade.
The second effect is that ringpop can be configured to enforce hashring consistency (measured via a farmhash checksum) during its forwarding path and return an error if the rings are not consistent. This is a backwards incompatible change which we are not ready to make at this moment.
We need to spend some time on designing and testing a workable non-breaking upgrade path for this before we can cut a new release that is compatible with node 4.4.7. Until we have a solution you can use your own branch or this branch in your npm file via a git url as your dependency.
/cc @mennopruijssers @Raynos
@thanodnl thanks for the update. I did suspect that it might not all be plain sailing. Fear not, I will use my own branch until an upgrade path for you guys has been devised; unfortunately I cannot offer much by way of assistance in implementing such an upgrade path, as I am currently using ringpop from scratch, so have no legacy systems that require the upgrade, but more to the point I am not proficient enough with the innerworkings of farmhash/ringpop to be able to commit changes that aren't realistically going to take a bunch of reworking from one of the actual ringpop engineers. Thus I will use my branch for now, and keep an eye on uber/ringpop-node for the eventual upgrade. If you remember, could you cc me once ringpop is compatible with 4.4.7. Sorry I cannot be of more assistance at this time, but until then, so long, and thanks for all the fish :)
@sashahilton00 if you are just starting make sure to set the isCrossPlatform
to true in options to switch farmhash.hash32
to farmhash.fingerprint32
(code). That will prevent the problems described here :). We will make sure to update the documentation with regards to this.
example:
var ringpop = new Ringpop({
isCrossPlatform: true
});
Will do, thanks.
@thanodnl
It sounds like the safest thing to do is look at https://github.com/lovell/farmhash/tree/v0.2.2
We need to:
Basically, if we rebase the farmhash master branch and remove that single commit we should be able to create a farmhash 0.2.x build that has node4 compat.
Hello, perhaps the farmhash
module should provide/expose both versions of the hash32
function so ringpop
can explicitly call the original v1 implementation and everyone else by default uses the current v1.1 implementation.
@lovell that would be pretty cool :)
I just created a branch / tag of farmhash using the older version ( https://github.com/Raynos/farmhash/tree/v0.2.2-node4-compat ) but still having the NaN & node4 support.
The latest farmhash
v1.2.0 published to npm
bundles both FarmHash implementations, v1.0.0 ("legacy") and v1.1.0 ("current").
A new hash32v1 function provdes explicit access to the v1 implementation for ringpop
. All other functions continue to use the current implementation and are ~5-10% faster compared with legacy, depending mostly on which SIMD intrinsics (e.g. SSE4.2, AVX) are available.
Enjoy!
Also, I guess tchannel will need to be updated, as it is using the hash32 function of farmhash@1.1.0 at the moment (https://github.com/uber/tchannel-node/blob/66b8d42f07986fbae2b76dc73778ba070223d417/v1/frame.js#L82), could this be the cause of the checksum inconsistencies you were seeing when testing this in production @thanodnl?
@sashahilton00 the v1 protocol is unused at the moment.
Connection statically imports v2 protocol ( https://github.com/uber/tchannel-node/blob/master/connection.js#L33 ). We really should delete the v1 folder.
@Raynos ah right, that makes things easier then. PR #295 is just a fix to use the hash32v1 function to solve the problem @thanodnl mentioned with hash inconsistencies. On a side note, I have been looking into using Ringpop with Docker, and one of the issues I have encountered is that Docker does not expose it's external port to the node apps, thus advertising on Hyperbahn/Ringpop is somewhat challenging. I assume you guys have found a way to grab the randomly assigned port so that you can dynamically spin up/down where necessary?
@sashahilton00 you can make ringpop listen on a static port. You will need to pass a [[https://github.com/uber/tchannel-node/blob/v3.6.9/channel.js#L610 | tchannel instance]] with a predefined port [[https://github.com/uber/ringpop-node/blob/v10.15.0/index.js#L102 | to ringpop]].
Does that clarify it?
@Motiejus yeah, that's what I have been doing at the moment, though it does mean that spinning up/down instances cannot be done in a completely automated manner (when running multiple instances on one machine) as one would have to change the static port in each instance of the node app and then pass the -p flag when starting docker. Starting a TChannel on a static port isn't a problem as docker always maps the random external port to the port the app is listening on, the problem is that the ringpop hostPort will change on every restart due to docker choosing a different external port each time, thus ringpop surely needs to know what the external port is. Thanks.
If I understand correctly what you are trying to achieve, you can have the same static port for all the containers, but the containers themselves can expose a variable port. Let's take an example with nginx:
docker@default:~$ docker run -ti --rm -p 80 nginx:latest
docker@default:~$ docker run -ti --rm -p 80 nginx:latest
See the two containers:
docker@default:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5f8c245fe0ab nginx:latest "nginx -g 'daemon off" About a minute ago Up About a minute 443/tcp, 0.0.0.0:32773->80/tcp fervent_booth
676d4a38bd25 nginx:latest "nginx -g 'daemon off" About a minute ago Up About a minute 443/tcp, 0.0.0.0:32772->80/tcp boring_bassi
Try to ping them:
docker@default:~$ curl -s localhost:32772 | head -1
<!DOCTYPE html>
docker@default:~$ curl -s localhost:32773 | head -1
<!DOCTYPE html>
As we can see, we can start as many instances of ringpop on a local node as we want, all listening to the static ports inside the container. Docker will do the right thing in creating ephemeral ports and mapping them correctly inside.
Does that make sense?
@Motiejus Yes that makes complete sense, so how does an external service running on server A call a service running in docker on server B over Ringpop/Hyperbahn given that server A has advertised the service on the internal static port, yet docker has mapped the internal static port to an unknown (to the node.js app) ephemeral port which has not been advertised on Ringpop/Hyperbahn? AKA. How can, in the example above, nginx know that the ephemeral port that it is accessible over is 32773 and thus advertise this on the Ringpop/Hyperbahn cluster?
Oh, I see your point now. Sorry, I don't know. For that, I would encourage to have a chat with tchannel/hyperbahn folks, as they have more knowledge on how RPC works (opening an issue in https://github.com/uber/hyperbahn would be a good start).
Let's try to keep this thread about node v4 support.
Yes let's, it was only a side note :) Will open an issue on the ringpop repo.
@sashahilton00
The way we run docker in production is we share the network with the physical host. AKA if you listen on ip:port then you listen on ip:port.
ringpop also supports non-static ports, you can call listen(0)
and rely on the OS to assign a port then use channel.address()
to get the port. The way you would bootstrap a "dynamic" ringpop would be to use the "hyperbahn" bootstrap feature.
The "hyperbahn" bootstrap feature is actually implemented in some of our internal apps, I copied an example into a gist ( https://gist.github.com/Raynos/6c4552d00d1a3da30634377ae02dde07 ).
Basically:
Thanks @Raynos. Also, on a side note, at some point ringpop broke the example code in the docs, as running it with ringpop v0.10.8 works as expected, but ringpop v10.15.0 throws an AssertionError: TChannel must listen on top channel
due to the line ringpop.channel.listen(ports[index], host, function onListen() {
. The fix seems to be to amend it to tchannel.listen(port, host, function onListen() {
, I assume that this should be fine? Tested in both node v0.10.46 and v4.4.7 using PR #295, though @thanodnl or someone else will have to rerun some tests on that to make sure the hashring inconsistencies are fixed on your side.
Also, would someone ind enlightening me as to how one accesses the POST data from a request that has been handleOrProxy()
'd by ringpop? I can see it in the _writeableState of the req
paramenter, and was wondering if there is an easy way to access it aside from res._writeableState.etc? Thanks.
Ah, just re-read what Nils wrote and saw that farmhash was not the only issue that arises from moving to v4 and is not the solution to the hashring problem. Disregard previous comments referencing PR #295 as the solution, as all it does is solve the checksum problem, which Nils noted was not much of a problem. Will leave this one to you guys and just work from my branch as I have no need of the non-breaking upgrade path Nils mentions. Thanks for the assistance thus far anyway.
Dealt with by #304. Should be closed. /cc @thanodnl
followup on #289