Open therb1 opened 3 years ago
Yes, we'd need to catch this type of exception and stop the repairs that are scheduled for a keyspace that doesn't exist anymore.
Our stack trace looks like:
i.c.s.SchedulingManager - catch exception com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError: Unknown keyspace <missing ks> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2216) at com.google.common.cache.LocalCache.get(LocalCache.java:4147) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:5053) at io.cassandrareaper.jmx.ClusterFacade.getRangeToEndpointMap(ClusterFacade.java:278) at io.cassandrareaper.service.RepairRunService.generateSegments(RepairRunService.java:170) at io.cassandrareaper.service.RepairRunService.registerRepairRun(RepairRunService.java:108) at io.cassandrareaper.service.SchedulingManager.createNewRunForUnit(SchedulingManager.java:301) at io.cassandrareaper.service.SchedulingManager.manageSchedule(SchedulingManager.java:151) at io.cassandrareaper.service.SchedulingManager.run(SchedulingManager.java:101) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Caused by: java.lang.AssertionError: Unknown keyspace <missing ks> at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106) at org.apache.cassandra.service.StorageService.constructRangeToEndpointMap(StorageService.java:2023) at org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:1868) at org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:1817) at org.apache.cassandra.service.StorageService.getRangeToEndpointMap(StorageService.java:1756) at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at java.security.AccessController.doPrivileged(Native Method) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)
A simple mitigation might be to just return a reaper error instead of the ArgumentException... I would think eventually the system will clean up the keyspace.
Project board link
BUG?
to reproduce the bug you need: 1) Start keyspace test_keyspace repair in webui 2) Stop cassandra-reaper service 2) DROP keyspace that is already in the process of being repaired 3) Start cassandra-reaper
┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: REAP-113