xuxueli / xxl-job

A distributed task scheduling framework.(分布式任务调度平台XXL-JOB)
http://www.xuxueli.com/xxl-job/
GNU General Public License v3.0
27.46k stars 10.86k forks source link

2.0.1执行器做集群部署 挂掉只剩一个后无法正常执行任务 #701

Closed shaojava closed 5 years ago

shaojava commented 5 years ago

Please answer some questions before submitting your issue. Thanks! 2.0.1执行器做集群部署 挂掉只剩一个后无法正常执行任务

Which version of XXL-JOB do you using?

2.0.1

Expected behavior

12:06:51.501 logback [pool-3-thread-14] INFO c.x.r.r.i.r.XxlRpcReferenceBean - >>>>>>>>>>> xxl-job, invoke error, address:192.168.16.234:9998, XxlRpcRequestXxlRpcRequest{requestId='7075b043-96df-40b4-86ab-70661fdfadcf', createMillisTime=1545970010501, accessToken='', className='com.xxl.job.core.biz.ExecutorBiz', methodName='beat', parameterTypes=[], parameters=null, version='null'} 12:06:51.502 logback [pool-3-thread-14] ERROR c.x.j.a.core.route.ExecutorRouter - java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.eclipse.jetty.io.SelectorManager.finishConnect(SelectorManager.java:341) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processConnect(SelectorManager.java:676) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processKey(SelectorManager.java:645) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:612) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:550) at org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748)

com.xxl.rpc.util.XxlRpcException: java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.eclipse.jetty.io.SelectorManager.finishConnect(SelectorManager.java:341) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processConnect(SelectorManager.java:676) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.processKey(SelectorManager.java:645) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.select(SelectorManager.java:612) at org.eclipse.jetty.io.SelectorManager$ManagedSelector.run(SelectorManager.java:550) at org.eclipse.jetty.util.thread.NonBlockingThread.run(NonBlockingThread.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) at java.lang.Thread.run(Thread.java:748)

at com.xxl.rpc.remoting.invoker.reference.XxlRpcReferenceBean$1.invoke(XxlRpcReferenceBean.java:161)
at com.sun.proxy.$Proxy82.beat(Unknown Source)
at com.xxl.job.admin.core.route.strategy.ExecutorRouteFailover.route(ExecutorRouteFailover.java:26)
at com.xxl.job.admin.core.trigger.XxlJobTrigger.processTrigger(XxlJobTrigger.java:130)
at com.xxl.job.admin.core.trigger.XxlJobTrigger.trigger(XxlJobTrigger.java:76)
at com.xxl.job.admin.core.thread.JobTriggerPoolHelper$1.run(JobTriggerPoolHelper.java:35)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Actual behavior

Steps to reproduce the behavior

部署两个执行器 挂掉一个后 任务无法正常执行 控制台抛出异常

Other information

任务触发类型:Cron触发 调度机器:192.168.16.234 执行器-注册方式:自动注册 执行器-地址列表:[192.168.16.234:9998, 192.168.16.234:9999] 路由策略:故障转移 阻塞处理策略:单机串行 任务超时时间:0 失败重试次数:0

触发调度<<<<<<<<<<< 心跳检测: address:192.168.16.234:9998 code:200 msg:null

触发调度: address:192.168.16.234:9998 code:200 msg:null 9998挂掉 故障转移 还是调度到 9998 没看懂 任务一直处于 执行中 8 2018-12-28 12:06:35 成功 查看     执行日志终止任务
8 2018-12-28 12:06:30 成功 查看     执行日志终止任务
shaojava commented 5 years ago

执行器挂掉后 调度中心 有时间延时 无法立即响应 执行器状态 执行中的任务还是会被调度 到挂掉的执行器

dongzehong commented 5 years ago

执行器挂掉后 调度中心 有时间延时 无法立即响应 执行器状态 执行中的任务还是会被调度 到挂掉的执行器

挂掉后没有立即响应,原因应该是执行器是每隔30秒注册到调度中心,也就是心跳; 不知道你的程序后续怎么样了?调度中心发现9998挂掉了以后,应该就调度到9999去了吧?

xuxueli commented 5 years ago

调度中心感知执行器在线节点存在短暂的时间延迟。

从上面Log看到 ”心跳检测:address:192.168.16.234:9998 code:200“ 说明 9998 对应执行器心跳检测成功,并没有下线,所以才会触发 9998 调度。