platinasystems / go

Other
9 stars 68 forks source link

vnetd crashes during regression run #143

Closed sandeep-dutta closed 5 years ago

sandeep-dutta commented 5 years ago

Observed the issue on following goes version

root@invader29:/home/sandeep# goes version v1.1.1

root@invader29:/home/sandeep# goes vnetd -version fe1: v1.1.3 fe1a: v1.1.0 vnet-platina-mk1: v1.0.0

It has been observed that during regression run, vnetd daemon is found to be crashed on one or few invaders. While looking on syslog found pcc related logs on all the invaders. However we are not sure if this is the cause for vnetd to go down. Below are the pcc logs for invader-30 (172.17.2.30) for which vnetd daemon crashed.The issue is not consistently reproducible.

ystem/systemCollector --brokerUri=172.17.2.59:9092 --schemaRegistryUri=http://172.17.2.59:9095 --period=2 --node=i30 --nodeId=66 --cluster=PROD1 --site=BOS01> /home/pcc/collectors/system/nohup.out &) Dec 26 23:06:01 invader30 CRON[31879]: (root) CMD (validatelldpconnec.sh -n Invader30) Dec 26 23:06:01 invader30 CRON[31880]: (root) CMD (bash validatelldpconnec.sh -n Invader30) Dec 26 23:06:01 invader30 CRON[31867]: (CRON) info (No MTA installed, discarding output) Dec 26 23:06:02 invader30 CRON[31866]: (CRON) info (No MTA installed, discarding output) Dec 26 23:07:01 invader30 CRON[701]: (root) CMD (validatelldpconnec.sh -n Invader30) Dec 26 23:07:01 invader30 CRON[702]: (pcc) CMD (ps aux| grep systemCollector | grep -v grep || nohup /home/pcc/collectors/system/systemCollector --brokerUri=172.17.2.59:9092 --schemaRegistryUri=http://172.17.2.59:9095 --period=2 --node=i30 --nodeId=66 --cluster=PROD1 --site=BOS01> /home/pcc/collectors/system/nohup.out &) Dec 26 23:07:01 invader30 CRON[703]: (root) CMD (bash validatelldpconnec.sh -n Invader30) Dec 26 23:07:01 invader30 CRON[691]: (CRON) info (No MTA installed, discarding output) Dec 26 23:07:02 invader30 CRON[690]: (CRON) info (No MTA installed, discarding output)

To bring vnetd service up, we need to power cycle the invader through its redis ip using the below cmd. root@invader43:/home/sandeep# redis-cli -h 172.17.3.29 hset platina psu.powercycle true

Note- The issue is not always reproducible

sandeep-dutta commented 5 years ago

The other way to bring up vnetd is to executed the following cmds. However this approach does not works always.

rmmod platina-mk1 modprobe platina-mk1 provision=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ifdown -a --allow vnet ifup -a --allow vnet goes restart

rondv commented 5 years ago

please attach journalctl output from invader that crashed, in this case inv29

For example, if inv29 crashed within last 5 minutes, capture this output: journalctl --since -300s

sandeep-dutta commented 5 years ago

Closing the bug since the issue is currently not reproducible with latest GoES binary v1.2.0-rc.1.