vipshop / Saturn

The vip.com's distributed job scheduling platform.
Apache License 2.0
2.28k stars 701 forks source link

job not running告警优化 #697

Open RolfHeG opened 4 years ago

RolfHeG commented 4 years ago

job not running增加判断,是否有其他分片在nextFireTime之前就已经开始运行到现在 假如有,说明可能处于以下两种情况,作业正常无需告警: 1.有重新分片任务下发到/necessary节点,当前分片机器正在block等待running的分片运行结束 2.当前分片被failover,但是其他executor都有该job的分片任务并处于running状态,failover无法立即运行