thelastpickle / cassandra-reaper

Automated Repair Awesomeness for Apache Cassandra
http://cassandra-reaper.io/
Apache License 2.0
489 stars 217 forks source link

java.lang.AssertionError: Unknown keyspace test_keyspace #1038

Open therb1 opened 3 years ago

therb1 commented 3 years ago

Project board link

BUG?

to reproduce the bug you need: 1) Start keyspace test_keyspace repair in webui 2) Stop cassandra-reaper service 2) DROP keyspace that is already in the process of being repaired 3) Start cassandra-reaper


Mar 17 01:13:17 server.example.com cassandra-reaper[30949]: com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError: Unknown keyspace test_keyspace
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2216)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.google.common.cache.LocalCache.get(LocalCache.java:4147)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:5053)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.jmx.ClusterFacade.getRangeToEndpointMap(ClusterFacade.java:276)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.service.RepairRunner.<init>(RepairRunner.java:92)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.service.RepairRunner.create(RepairRunner.java:170)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.service.RepairManager.startRunner(RepairManager.java:362)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.service.RepairManager.startRepairRun(RepairManager.java:328)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.service.RepairManager.resumeUnkownRunningRepairRuns(RepairManager.java:167)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.service.RepairManager.resumeRunningRepairRuns(RepairManager.java:138)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:293)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:104)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:43)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:87)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.dropwizard.cli.Cli.run(Cli.java:78)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.dropwizard.Application.run(Application.java:93)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at io.cassandrareaper.ReaperApplication.main(ReaperApplication.java:123)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]: Caused by: java.lang.AssertionError: Unknown keyspace test_keyspace
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at org.apache.cassandra.service.StorageService.constructRangeToEndpointMap(StorageService.java:1933)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:1778)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:1727)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at org.apache.cassandra.service.StorageService.getRangeToEndpointMap(StorageService.java:1666)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.lang.reflect.Method.invoke(Method.java:498)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.lang.reflect.Method.invoke(Method.java:498)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.GeneratedMethodAccessor168.invoke(Unknown Source)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.lang.reflect.Method.invoke(Method.java:498)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.transport.Transport$1.run(Transport.java:200)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.transport.Transport$1.run(Transport.java:197)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.security.AccessController.doPrivileged(Native Method)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.security.AccessController.doPrivileged(Native Method)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
Mar 17 01:13:17 server.example.com cassandra-reaper[30949]:         at java.lang.Thread.run(Thread.java:748)
Mar 17 01:13:17 server.example.com systemd[1]: cassandra-reaper.service: Main process exited, code=exited, status=1/FAILURE
Mar 17 01:13:17 server.example.com systemd[1]: cassandra-reaper.service: Unit entered failed state.
Mar 17 01:13:17 server.example.com systemd[1]: cassandra-reaper.service: Failed with result 'exit-code'.

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: REAP-113

adejanovski commented 3 years ago

Yes, we'd need to catch this type of exception and stop the repairs that are scheduled for a keyspace that doesn't exist anymore.

xgerman commented 1 year ago

Our stack trace looks like:

i.c.s.SchedulingManager - catch exception com.google.common.util.concurrent.ExecutionError: java.lang.AssertionError: Unknown keyspace <missing ks> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2216) at com.google.common.cache.LocalCache.get(LocalCache.java:4147) at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:5053) at io.cassandrareaper.jmx.ClusterFacade.getRangeToEndpointMap(ClusterFacade.java:278) at io.cassandrareaper.service.RepairRunService.generateSegments(RepairRunService.java:170) at io.cassandrareaper.service.RepairRunService.registerRepairRun(RepairRunService.java:108) at io.cassandrareaper.service.SchedulingManager.createNewRunForUnit(SchedulingManager.java:301) at io.cassandrareaper.service.SchedulingManager.manageSchedule(SchedulingManager.java:151) at io.cassandrareaper.service.SchedulingManager.run(SchedulingManager.java:101) at java.util.TimerThread.mainLoop(Timer.java:555) at java.util.TimerThread.run(Timer.java:505) Caused by: java.lang.AssertionError: Unknown keyspace <missing ks> at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:316) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:129) at org.apache.cassandra.db.Keyspace.open(Keyspace.java:106) at org.apache.cassandra.service.StorageService.constructRangeToEndpointMap(StorageService.java:2023) at org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:1868) at org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:1817) at org.apache.cassandra.service.StorageService.getRangeToEndpointMap(StorageService.java:1756) at sun.reflect.GeneratedMethodAccessor89.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72) at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:276) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at java.security.AccessController.doPrivileged(Native Method) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1408) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:357) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:573) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:834) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:688) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:687) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

A simple mitigation might be to just return a reaper error instead of the ArgumentException... I would think eventually the system will clean up the keyspace.