Closed mwtzzz-zz closed 7 years ago
I've never seen this behaviour before. These are incoming connections? (e.g. you're running on port 2001?) Nothing off the top of my head would explain why the relay wouldn't read data, it closes when it finds EOF, unless it times out reading or something, then it will disconnect. What kind of clients are these?
These are incoming connections (relay listens on port 2001). The clients are various linux hosts in EC2 running our applications. The clients connect to the relay via an ELB. The clients run a mix of collectd, and a custom graphite client that basically netcats metrics to the ELB.
I downloaded the source and compiled it using make, I didn't give it any special options. For the first 30 minutes or so, the throughput is about 1/8 of the relays running 1.11, then it drops to zero.
I might try playing around with different 2.x versions and see if it has the same behavior. Other than that, I'm not sure what could be going on.
update: I compiled v2.2 and ran it without the -U option .. Still seeing the close_wait problem.
update #2: I compiled v2.1 and it runs fine, no problems.
so something between 2.1 and 2.2 changed which causes this issue on our systems.
@mwtzzz : sorry for the intrusion, but you can use git bisect
to easily find exact commit, which causing an issue.
@deniszh I'm using git bisect, but immediately running into a bison error when running "make":
[mmartinez@ec2- radar112 ~]$ git clone https://github.com/grobian/carbon-c-relay.git carbon-c-relay
Cloning into 'carbon-c-relay'...
remote: Counting objects: 4257, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4257 (delta 1), reused 1 (delta 0), pack-reused 4253
Receiving objects: 100% (4257/4257), 1.87 MiB | 0 bytes/s, done.
Resolving deltas: 100% (2913/2913), done.
Checking connectivity... done.
[mmartinez@ec2- radar112 ~]$ cd carbon-c-relay
[mmartinez@ec2- radar112 carbon-c-relay]$ git bisect start
[mmartinez@ec2- radar112 carbon-c-relay]$ git bisect bad
[mmartinez@ec2- radar112 carbon-c-relay]$ git bisect good v2.1
Bisecting: 160 revisions left to test after this (roughly 7 steps)
[bbbd6ed920f2b435fafa48e6595f5939e60dddc8] conffile: implemented include
[mmartinez@ec2- radar112 carbon-c-relay]$ make cc -O2 -Wall -Wshadow -DGIT_VERSION=\"bbbd6e\" -pthread -c -o relay.o relay.c cc -O2 -Wall -Wshadow -DGIT_VERSION=\"bbbd6e\" -pthread -c -o md5.o md5.c cc -O2 -Wall -Wshadow -DGIT_VERSION=\"bbbd6e\" -pthread -c -o consistent-hash.o consistent-hash.c cc -O2 -Wall -Wshadow -DGIT_VERSION=\"bbbd6e\" -pthread -c -o receptor.o receptor.c cc -O2 -Wall -Wshadow -DGIT_VERSION=\"bbbd6e\" -pthread -c -o dispatcher.o dispatcher.c bison -d conffile.y conffile.y:35.20-30: error: syntax error, unexpected {...} make: *** [conffile.tab.c] Error 1 [mmartinez@ec2- radar112 carbon-c-relay]$
I believe you need bison version 3.
You can touch the produced files, I checked them into the repo for this reason.
touch conffile.yy.c conffile.tab.c conffile.tab.h
touch configure.ac Makefile.am aclocal.m4 configure Makefile.in config.h.in
this should work (I do this for the travis runs)
Thanks for trying to find the culprit!
@grobian thanks the git bisect
, ....
Ok, here's what I've narrowed it down to:
[mmartinez@ec2 radar112 carbon-c-relay]$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 1 step)
[e12b412e263905c552826c4aa3855c92be7a6be7] aggregator_expire: run entire invocation loop under lock
[mmartinez@ec2 radar112 carbon-c-relay]$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[c3a4341837a90d774147115ff0116a13d614bdfb] dispatcher: move struct init before thread forking
[mmartinez@ec2 radar112 carbon-c-relay]$ git bisect good
e12b412e263905c552826c4aa3855c92be7a6be7 is the first bad commit
commit e12b412e263905c552826c4aa3855c92be7a6be7
Author: Fabian Groffen <grobian@gentoo.org>
Date: Sun Sep 11 10:29:22 2016 +0200
aggregator_expire: run entire invocation loop under lock
Access to the invocation buckets happens concurrently, so we need to
lock down the entire loop to make it safe. A better strategy for
aggregations is necessary.
:100644 100644 ede3a59c0f35769546f65c632272e1531133987f 60b890d6516591d4a28b7a4cea175a702e9c317a M aggregator.c
So that aggregator_expire commit is the first one to produce the unread sockets? Are you using aggregations in your configs?
Yes to both your questions. The aggregator_expire commit is the first one to produce unread sockets, and yep we are using aggregations.
Ok, I think I might know what direction to search for. It may be solved by PR #274.
@grobian That's good to hear. Let me know when you're ready for me to test.
I'd be interested to know if applying the patch from PR #274 solves the close_wait sockets problem.
Testing now ...
I put the new aggregator.c into v3.1, compiled it and ran it. It looks much better. Still see a single lingering CLOSE_WAIT with a single unread byte:
tcp CLOSE-WAIT 1 0 172.17.25.160:48207 172.17.24.203:2001
But it's not interfering with anything and I can still gracefully stop and start service. So far, it looks like your changes have fixed the issue. I'll let it run today and keep an eye on it.
ok, that's good to hear
Ran it all night, it's working fine. I'm going to roll it out to production. Thanks for working on this issue!
I rolled it out to all our production clusters yesterday and it's working great. To give you an idea of our throughput, we're writing about 15million metrics / minute. We've got a cluster of 10 i3.xlarge instances running only the relay, each of these hosts is doing a network throughput of just over 1GB bytes .We've got a backend cluster of 12 i3.2xlarge instances running the relay + carbon-cache. I've got various network stack stuff tuned on the relay layer and running relay with B 4096 -U 16777216
.
@grobian Out of curiosity, do you know of other companies that are writing a similar (or more) quantity of metrics as us?
At Booking.com they say they push 1million metrics/second, (thus 60million/minute). In a later slide they even mention 2million, the 8million is because there is DR (x2) and replication=2 (x2=x4).
Another datapoint: we (sortable.com; one of my coworkers is the person who put together #274) are doing 2-3 million metrics/minute to 1 carbon-c-relay on a c4.2xlarge, which then forwards to 2 i3.xlarges running go-carbon. We aggregate heavily, though, so only ~400k metrics/minute leave the carbon-c-relay instance.
@cldellow Our number (15million/minute) takes place on 10 carbon-c-relays and 12 backends, so it looks like your number would be about twice ours. What kind of tuning, if any, have you done on your c4.2xlarge? My bottleneck right now is not the relay layer but the backend.
No tuning that I can recall. It sounds like your backends are much busier than ours, so I don't think we'd have anything useful to say there unfortunately :(
We do more then 30 milions / minute from top carbon-c-relays (6 instances c4.xlarge) and they sending hashed and replicated (factor 2) traffic to 5 instances i2.4xlarge with go-carbon
Then top relays producing more then 60 milions metrics / minute to carbon backends
Inside go-carbon instances:
We are making only some medium aggregation and data are matched and sended to one c4.xlarge with second as failover from top carbon-c-relay. Then from this aggregators data goes to this 5 nodes cluster with go-carbon.
@szibis What kind of tuning have you done on the 6 carbon-c-relay instances? I was running into a network bottleneck (dropped metrics, dropped packets) with 8 i3.xlarge; I had to add two instances, bringing it to a total of ten, to alleviate the bottleneck.
What's the tenancy attribute of your 6 carbon-c-relay instances?
@mwtzzz mostly high batch sizes. Each go-carbon instance takes about 20MB/s of traffic which is bellow AWS instance limits, any I use.
@szibis What batch sizes are you using? Are you specifying it with -B
? I'm currently running relay -q 400000 -B 4096 -U 16777216
/usr/bin/relay -p 2013 -w 32 -b 40000 -q 30000000 -B 32 -T 1000 -f /etc/carbon-c-relay/relay.conf
And go-carbon instances as data stores are highly tuned to be able to take all that traffic smoothly.
thanks, I'm going to try out those settings and see if they make a difference.
@szibis by the way, how are you getting 32 cores on a c4.xlarge? This instance type only has four cores:
[salt-master2 ~]$ salt-call grains.get instance_type; nproc --all
local:
c4.xlarge
4
I'm testing 3.1 on one of my relay hosts. The first thing I noticed is that the number of CLOSE_WAIT sockets with unread bytes in Recv-Q climbs steadily until about 3,000 where it stays - I assume it has hit some system-imposed limit at this point:
tcp 116 0 172.17.25.160:2001 172.17.29.171:26172 CLOSE_WAIT 13084/relay
tcp 1 0 172.17.25.160:2001 172.17.29.171:25234 CLOSE_WAIT 13084/relay
You'll see in the above example that there are 116 unread bytes in one such socket, and 1 unread byte in another. As a result, it is impossible to gracefully terminate the relay process; the only way to stop it is with kill -9.
I do not see this behavior on my hosts running carbon-c-relay-1.11. On those hosts, there are no lingering CLOSE_WAIT connections, everything in the receive queue gets processed.