Closed tianluyuan closed 1 year ago
https://github.com/icecube/skymap_scanner/actions/runs/6186677592/job/16794960197#step:6:1
Looks like the server is not crashing but runs without registering the returned result from the client. And it seems like the clients do not continue to receive or return additional pixels.
This is the partial log of a test that completes
2023-09-14 14:20:40 fv-az221-946 ewms-pilot[12] INFO TASK FINISHED -- attempting to send result message...
2023-09-14 14:20:40 fv-az221-946 mqclient[12] INFO Sending Message: 442 bytes
2023-09-14 14:20:40 fv-az221-946 ewms-pilot[12] INFO Now, attempting to ack original message...
2023-09-14 14:20:40 fv-az221-946 ewms-pilot[12] INFO 1 Tasks Finished
2023-09-14 14:20:40 fv-az221-946 mqclient[12] INFO Received Message: 77
This one doesn’t
2023-09-14 14:20:48 fv-az1128-620 ewms-pilot[12] INFO TASK FINISHED -- attempting to send result message...
2023-09-14 14:20:48 fv-az1128-620 mqclient[12] INFO Sending Message: 442 bytes
2023-09-14 14:46:35 fv-az1128-620 mqclient.rabbitmq[12] INFO [message_generator()] No messages in idle timeout window.
This links to the full failed log
Closed by #211
It seems like sometimes millipede tests can hang, or at least run indefinitely. For an example see here
I think either the server crashed or the client doesn't return any results. The former is more likely as there isn't a consistent progress update. What's strange is the same test can be rerun and it will complete as expected.