spotify / cassandra-reaper

Software to run automated repairs of cassandra
235 stars 60 forks source link

Store one "last event" per repair runner currentlyRunningSegments slot #115

Open Bj0rnen opened 9 years ago

Bj0rnen commented 9 years ago

Since we started doing parallel repairs, the "last event" portion of repair runs got a lot less informative. While most threads may be idle waiting for a repair segment to finish, one or more threads are usually trying to repair but postponing for various reasons. The result is that "last event" usually says "Postponed due to ...". This gives an impression that things aren't moving forward when they are.

I propose that we store one message per thread. That will help us get an overview of current activity, and see when everything is truly stuck.