Since we started doing parallel repairs, the "last event" portion of repair runs got a lot less informative. While most threads may be idle waiting for a repair segment to finish, one or more threads are usually trying to repair but postponing for various reasons. The result is that "last event" usually says "Postponed due to ...". This gives an impression that things aren't moving forward when they are.
I propose that we store one message per thread. That will help us get an overview of current activity, and see when everything is truly stuck.
Since we started doing parallel repairs, the "last event" portion of repair runs got a lot less informative. While most threads may be idle waiting for a repair segment to finish, one or more threads are usually trying to repair but postponing for various reasons. The result is that "last event" usually says "Postponed due to ...". This gives an impression that things aren't moving forward when they are.
I propose that we store one message per thread. That will help us get an overview of current activity, and see when everything is truly stuck.