marvel-nccr / quantum-mobile

A Virtual Machine for computational materials science
https://quantum-mobile.readthedocs.io
Other
91 stars 32 forks source link

Randomly failing RabbitMQ #134

Closed giovannipizzi closed 3 years ago

giovannipizzi commented 4 years ago

During the 2020 tutorial, about (estimate) 5-10% of the machines had an issue, where RabbitMQ was not running at startup, and would run only after a manual restart (with something like sudo service rabbitmq-server restart).

After looking into the logs, I found these relevant lines in the syslog that might help us debug the problem:

Jul  6 20:16:38 quantum-mobile rabbitmq[953]: Waiting for 'rabbit@quantum-mobile'
Jul  6 20:16:38 quantum-mobile rabbitmq[953]: pid is 16793
Jul  6 20:16:38 quantum-mobile rabbitmq[953]: Error: process_not_running
Jul  6 20:16:38 quantum-mobile systemd[1]: rabbitmq-server.service: Control process exited, code=exited status=70
Jul  6 20:16:38 quantum-mobile systemd-resolved[801]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Jul  6 20:16:38 quantum-mobile systemd-resolved[801]: message repeated 3 times: [ Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.]

Jul  6 20:18:08 quantum-mobile systemd[1]: rabbitmq-server.service: State 'stop-final-sigterm' timed out. Killing.
Jul  6 20:18:08 quantum-mobile systemd[1]: rabbitmq-server.service: Killing process 1926 (epmd) with signal SIGKILL.
Jul  6 20:18:08 quantum-mobile systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.
Jul  6 20:18:08 quantum-mobile systemd[1]: Failed to start RabbitMQ Messaging Server.
chrisjsewell commented 4 years ago

Ha you beat me to it I was just about to open this

ltalirz commented 3 years ago

I assume this issue did not resurface during the 2021 tutorial, so I'm closing this. If it did resurface, please reopen