mattshma / bigdata

hadoop,hbase,storm,spark,etc..
161 stars 79 forks source link

hbase ipc.server.max.callqueue.size is small #16

Open mattshma opened 8 years ago

mattshma commented 8 years ago

报错如下:

16/04/13 12:46:19 INFO client.AsyncProcess: #455357, table=table_name, attempt=12/35 failed 121 ops, last exception: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.RpcServer$CallQueueTooBigException): Call queue is full, is ipc.server.max.callqueue.size too small? on 10-2-96-35,60020,1459590257650, tracking started Wed Apr 13 12:45:01 CST 2016, retrying after 20145 ms, replay 121 ops.
16/04/13 12:46:19 INFO client.AsyncProcess: #456234, table=table_name, attempt=12/35 failed 80 ops, last exception: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.RpcServer$CallQueueTooBigException): Call queue is full, is ipc.server.max.callqueue.size too small? on 10-2-96-35,60020,1459590257650, tracking started Wed Apr 13 12:45:01 CST 2016, retrying after 20049 ms, replay 80 ops.
16/04/13 12:46:19 INFO client.AsyncProcess: #454764, table=table_name, attempt=12/35 failed 222 ops, last exception: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.RpcServer$CallQueueTooBigException): Call queue is full, is ipc.server.max.callqueue.size too small? on 10-2-96-35,60020,1459590257650, tracking started Wed Apr 13 12:45:01 CST 2016, retrying after 20165 ms, replay 222 ops.
16/04/13 12:46:19 INFO client.AsyncProcess: #454936, table=table_name, attempt=12/35 failed 85 ops, last exception: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.RpcServer$CallQueueTooBigException): Call queue is full, is ipc.server.max.callqueue.size too small? on 10-2-96-35,60020,1459590257650, tracking started Wed Apr 13 12:45:11 CST 2016, retrying after 20144 ms, replay 85 ops.

源码,出现该问题的条件为(totalRequestSize + callQueueSize.get()) > maxQueueSize,而maxQueueSize值为this.maxQueueSize = this.conf.getInt("hbase.ipc.server.max.callqueue.size", DEFAULT_MAX_CALLQUEUE_SIZE);,由RpcServer.javaDEFAULT_MAX_CALLQUEUE_SIZE = 1024 * 1024 * 1024;,即若hbase.ipc.server.max.callqueue.size未设置的话,值默认值为1024*1024*1024。找到问题后,修改hbase-site.xml,添加如下配置:

<property>
 <name>hbase.ipc.server.max.callqueue.size</name>
 <value>5368709120</value>
</property>

重启集群,该报错解决。

mattshma commented 8 years ago

解决上述问题后,thrift2中log如下:

16/04/15 13:46:43 INFO client.AsyncProcess: #13132, table=table_name, attempt=10/35 failed 116 ops, last exception: java.net.SocketTimeoutException: Call to 10-2-96-43/10.2.96.43:60020 failed because java.net.SocketTimeoutException: 2000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/10.2.96.34:10577 remote=10-2-96-43/10.2.96.43:60020] on 10-2-96-43,60020,1460688419948, tracking started Fri Apr 15 13:45:45 CST 2016, retrying after 10037 ms, replay 116 ops.
16/04/15 14:05:27 INFO client.AsyncProcess: #34, waiting for some tasks to finish. Expected max=0, tasksSent=57338, tasksDone=57337, currentTasksDone=57337, retries=1590 hasError=false, tableName=table_name
16/04/15 14:05:27 INFO client.AsyncProcess: #1, waiting for some tasks to finish. Expected max=0, tasksSent=56971, tasksDone=56970, currentTasksDone=56970, retries=1528 hasError=false, tableName=table_name
16/04/15 14:05:27 INFO client.AsyncProcess: #12043, waiting for some tasks to finish. Expected max=0, tasksSent=39101, tasksDone=39100, currentTasksDone=39100, retries=1125 hasError=false, tableName=table_name
16/04/15 14:05:27 INFO client.AsyncProcess: #13116, waiting for some tasks to finish. Expected max=0, tasksSent=38680, tasksDone=38679, currentTasksDone=38679, retries=1192 hasError=false, tableName=table_name
16/04/15 14:05:27 INFO client.AsyncProcess: #12033, waiting for some tasks to finish. Expected max=0, tasksSent=39141, tasksDone=39140, currentTasksDone=39140, retries=1148 hasError=false, tableName=table_name
justdo1980 commented 6 years ago

Hello, i am facing a similar issue. but failed to set the size to '5368709120' since the max for getInt is '2147483647' , could you share me how do you manage to set 5G here ?