taosdata / TDengine

High-performance, scalable time-series database designed for Industrial IoT (IIoT) scenarios
https://tdengine.com
GNU Affero General Public License v3.0
23.34k stars 4.85k forks source link

one node down after starting a continues query for a while #3823

Closed quchunhui closed 4 years ago

quchunhui commented 4 years ago

Bug Description 服务器3个节点,开启了连续查询之后,过一段时间其中一个节点的服务就宕掉了。

To Reproduce Steps to reproduce the behavior: 开启连续查询。语句create table qch_cq.device_data_up_sum as select sum(AI_PICK_UDATA) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc;

Screenshots image image

Environment (please complete the following information):

Additional Context 部分日志如下 10/13 10:09:07.744543 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, last wal is forwarded, ver:669 10/13 10:09:07.744548 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, last wal is forwarded, ver:670 10/13 10:09:07.744555 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, last wal is forwarded, ver:671 10/13 10:09:07.744558 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, last wal is forwarded, ver:672 10/13 10:09:07.744563 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, last wal is forwarded, ver:673 10/13 10:09:07.744580 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, last wal event:0x0 10/13 10:09:07.744585 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, data up to fversion:673 has been read out, bytes:444515 10/13 10:09:07.746257 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, wal retrieve is finished 10/13 10:09:07.746263 0x7fb6d97f2700 SYN vgId:3 peer:rexel-ids001:6040, sync retrieve process is successful 10/13 10:09:07.749793 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids001:6040, status msg received, self:master ver:673 peer:slave ver:0, ack:1 10/13 10:09:07.749806 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids001:6040, own role:master, new peer role:slave 10/13 10:09:07.749811 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids003:6040, it is the master, ver:673 10/13 10:09:07.749821 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids002:6040, status msg is sent 10/13 10:09:07.749828 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids001:6040, status msg is sent 10/13 10:09:07.749835 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids001:6040, status msg is sent 10/13 10:09:07.750575 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids002:6040, status msg received, self:master ver:673 peer:slave ver:673, ack:1 10/13 10:09:07.750580 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids002:6040, own role:master, new peer role:slave 10/13 10:09:07.750582 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids003:6040, it is the master, ver:673 10/13 10:09:07.750589 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids002:6040, status msg is sent 10/13 10:09:07.750964 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids001:6040, status msg received, self:master ver:673 peer:slave ver:0, ack:0 10/13 10:09:07.750967 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids001:6040, own role:master, new peer role:slave 10/13 10:09:07.750969 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids003:6040, it is the master, ver:673 10/13 10:09:07.751151 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids002:6040, status msg received, self:master ver:673 peer:slave ver:673, ack:0 10/13 10:09:07.751155 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids002:6040, own role:master, new peer role:slave 10/13 10:09:07.751157 0x7fb6daff5700 SYN vgId:3 peer:rexel-ids003:6040, it is the master, ver:673 10/13 10:09:07.862309 0x7fb6edff4700 MND user:root, failed to auth user, mnode is not master 10/13 10:09:07.862326 0x7fb6edff4700 DND user:root, send auth msg to mnodes 10/13 10:09:07.863461 0x7fb6edff4700 DND user:root, auth msg received from mnodes 10/13 10:09:07.863516 0x7fb705ffb700 DND 0x19d5dfc0, rpc msg:submit will be processed in vwrite queue 10/13 10:09:07.863539 0x7fb705ffb700 SYN vgId:3 peer:rexel-ids002:6040, forward is sent, ver:674 contLen:626 10/13 10:09:07.863547 0x7fb705ffb700 SYN vgId:3 peer:rexel-ids001:6040, forward is sent, ver:674 contLen:626 10/13 10:09:07.916683 0x7fb717f28700 CQ vgId:6, try connect to TDengine 10/13 10:09:07.919102 0x7fb6edff4700 MND user:monitor, failed to auth user, mnode is not master 10/13 10:09:07.919111 0x7fb6edff4700 DND user:monitor, send auth msg to mnodes 10/13 10:09:07.920191 0x7fb6edff4700 DND user:monitor, auth msg received from mnodes 10/13 10:09:07.920222 0x7fb706ffd700 DND 0x7fca90016f40, rpc msg:submit will be processed in vwrite queue 10/13 10:09:07.920660 0x7fb6d9ff3700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; is openned 10/13 10:09:07.927027 0x7fb706ffd700 DND 0x7fcac0003380, rpc msg:submit will be processed in vwrite queue 10/13 10:09:08.726781 0x7fb717f28700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; is openned 10/13 10:09:17.909438 0x7fb6edff4700 MND user:root, failed to auth user, mnode is not master 10/13 10:09:17.909456 0x7fb6edff4700 DND user:root, send auth msg to mnodes 10/13 10:09:17.910595 0x7fb6edff4700 DND user:root, auth msg received from mnodes 10/13 10:09:17.910649 0x7fb705ffb700 DND 0x19d4dff0, rpc msg:submit will be processed in vwrite queue 10/13 10:09:17.910681 0x7fb705ffb700 SYN vgId:3 peer:rexel-ids002:6040, forward is sent, ver:675 contLen:626 10/13 10:09:17.910689 0x7fb705ffb700 SYN vgId:3 peer:rexel-ids001:6040, forward is sent, ver:675 contLen:626 10/13 10:09:20.163981 0x7fb6edff4700 MND user:monitor, failed to auth user, mnode is not master 10/13 10:09:20.163996 0x7fb6edff4700 DND user:monitor, send auth msg to mnodes 10/13 10:09:20.165098 0x7fb6edff4700 DND user:monitor, auth msg received from mnodes 10/13 10:09:20.165122 0x7fb706ffd700 DND 0x7fc914017a70, rpc msg:submit will be processed in vwrite queue 10/13 10:09:21.112689 0x7fb70e7fc700 MND user:_root, failed to auth user, mnode is not master 10/13 10:09:21.112710 0x7fb70e7fc700 DND user:_root, send auth msg to mnodes 10/13 10:09:21.113848 0x7fb70e7fc700 DND user:_root, auth msg received from mnodes 10/13 10:09:21.113895 0x7fb70cff9700 DND 0x7fc8f8010fb0, msg:query will be processed in vread queue, qtype:0, msg:0x7fb6a8001440 10/13 10:09:21.114089 0x7fb707fff700 DND (nil), msg:query will be processed in vread queue, qtype:4, msg:0x7fb6ac02f210 10/13 10:09:21.115918 0x7fb6fe7fd700 DND 0x7fc8f8010fb0, msg:fetch will be processed in vread queue, qtype:0, msg:0x7fb6a8000e80 10/13 10:09:21.115947 0x7fb6fffff700 DND (nil), msg:query will be processed in vread queue, qtype:4, msg:0x7fb6a40016d0 10/13 10:09:21.117763 0x7fb7077fe700 DND 0x7fc8f8010fb0, msg:fetch will be processed in vread queue, qtype:0, msg:0x7fb6a8000ef0 10/13 10:09:27.970232 0x7fb6edff4700 MND user:root, failed to auth user, mnode is not master 10/13 10:09:27.970254 0x7fb6edff4700 DND user:root, send auth msg to mnodes 10/13 10:09:27.971375 0x7fb6edff4700 DND user:root, auth msg received from mnodes 10/13 10:09:27.971417 0x7fb705ffb700 DND 0x19d4dff0, rpc msg:submit will be processed in vwrite queue 10/13 10:09:27.971464 0x7fb705ffb700 SYN vgId:3 peer:rexel-ids002:6040, forward is sent, ver:676 contLen:626 10/13 10:09:27.971472 0x7fb705ffb700 SYN vgId:3 peer:rexel-ids001:6040, forward is sent, ver:676 contLen:626 10/13 10:09:28.912796 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912819 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912823 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912825 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912830 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912832 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912836 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912838 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912842 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912844 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912848 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912852 0x7fb70d7fa700 SYN vgId:6 peer:rexel-ids002:6040, forward is sent, ver:1130 contLen:66 10/13 10:09:28.912856 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912858 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912861 0x7fb70d7fa700 SYN vgId:6 peer:rexel-ids001:6040, forward is sent, ver:1130 contLen:66 10/13 10:09:28.912862 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912864 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912866 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912868 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912870 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912872 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912876 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912878 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912880 0x7fb70d7fa700 SYN vgId:6 peer:rexel-ids002:6040, forward is sent, ver:1131 contLen:66 10/13 10:09:28.912882 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912884 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912886 0x7fb70d7fa700 SYN vgId:6 peer:rexel-ids001:6040, forward is sent, ver:1131 contLen:66 10/13 10:09:28.912888 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912890 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912894 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912896 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912900 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready 10/13 10:09:28.912903 0x7fb6d97f2700 CQ vgId:6, id:1 CQ:select sum(ai_pick_udata) from qch_test.st_device_data_up interval(1m) sliding(30s) order by time desc; stream result is ready

Aries-Lee1991 commented 4 years ago

您的这个Issue我们工程师都在:https://github.com/taosdata/TDengine/issues/3821 这里面进行回复了。所以,这个Issue,我就关闭了。