Closed tlbtlbtlb closed 6 years ago
Running
grep 'environments have crashed' /mnt/kube-efs/universe-perfmon/*/*.out
suggests this happens to about 10% of worker runs. Currently I see 31 instances out of 296 runs.
I saw the same behavior in here!
Expected behavior
Started with
on a 8-core devbox.
Actual behavior
One of the 8 workers dies after 2 hours with backtrace below. AFAICT, it was getting reward messages seconds before timing out the rewarder connection.
Versions