rax-maas / blueflood

A distributed system designed to ingest and process time series data
http://www.blueflood.io
Apache License 2.0
596 stars 102 forks source link

com.netflix.astyanax.connectionpool.exceptions.TimeoutException #781

Open 42701618 opened 7 years ago

42701618 commented 7 years ago

Hi team: When I run the blueflood,sometime the error will happen. The error is com.netflix.astyanax.connectionpool.exceptions.TimeoutException: TimeoutException: [host=192.168.1.1(192.168.1.1):9160, latency=10009(10009), attempts=1]org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:188) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:61) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28) at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151) at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:69) at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:256) at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:478) at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$000(ThriftKeyspaceImpl.java:73) at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:116) at com.rackspacecloud.blueflood.io.AstyanaxWriter.insertRollups(AstyanaxWriter.java:370) at com.rackspacecloud.blueflood.service.RollupBatchWriteRunnable.run(RollupBatchWriteRunnable.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964) at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950) at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:122) at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:119) at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:56) ... 12 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 25 more

42701618 commented 7 years ago

If the error happen,the performance of rollup will descrease. The blueflood use the Astyanax 1.56.36,however the newest version of Netflix/astyanax is v3.9.0 .

shintasmith commented 7 years ago

@42701618 , this exception can be caused by many reasons. But generally, it's an indication that your Cassandra cluster is not responding in time (either it is overloaded, or not properly tuned/configured).

What is the size of your Cassandra cluster, your Replication Factor, the version of Cassandra you are using, your rollup configurations (MAX_ROLLUP_READ_THREADS, MAX_ROLLUP_WRITE_THREADS, MAX_LOCATOR_FETCH_THREADS) ?

42701618 commented 7 years ago

the version of Cassandra is 2.1.2 There are 3 node, one node is used for ingest,two nodes are used for rollup. MAX_LOCATOR_FETCH_THREADS=32 MAX_ROLLUP_READ_THREADS=32 MAX_ROLLUP_WRITE_THREADS=32

42701618 commented 7 years ago

iostat -d -x -k 1 100

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 2.69 0.15 1.58 2.47 17.07 22.63 0.06 31.97 9.92 1.71 xvde 0.02 1722.30 76.99 196.24 2132.42 7674.28 71.78 12.44 45.53 1.77 48.26

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvde 0.00 1527.00 394.00 140.00 12120.00 5828.00 67.22 138.66 206.88 1.87 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 5.00 0.00 5.00 0.00 40.00 16.00 0.21 42.40 42.40 21.20 xvde 0.00 1121.00 368.00 88.00 11080.00 3724.00 64.93 138.46 168.39 2.19 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 17.00 0.00 8.00 0.00 100.00 25.00 0.24 30.50 14.50 11.60 xvde 0.00 2653.00 229.00 270.00 7068.00 11304.00 73.64 116.55 464.49 2.00 99.60

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvde 0.00 2583.00 190.00 280.00 5928.00 11372.00 73.62 103.14 223.32 2.13 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvde 0.00 1561.00 161.00 221.00 4704.00 8748.00 70.43 137.76 283.29 2.62 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvde 0.00 1165.00 405.00 140.00 12636.00 5516.00 66.61 138.91 257.00 1.83 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvde 0.00 1451.00 364.00 165.00 10928.00 6720.00 66.72 137.65 234.08 1.89 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 0.00 0.00 9.00 0.00 36.00 8.00 0.32 36.00 8.89 8.00 xvde 0.00 1254.00 407.00 122.00 12024.00 5164.00 64.98 141.09 284.13 1.89 100.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xvde 0.00 1495.00 488.00 144.00 14532.00 5780.00 64.28 146.16 206.27 1.58 100.00