thelastpickle / cassandra-reaper

Automated Repair Awesomeness for Apache Cassandra
http://cassandra-reaper.io/
Apache License 2.0
487 stars 217 forks source link

Repair is falling with error "SERIAL is not supported as conditional update commit consistency. Use ANY" #1021

Open lavaraja opened 3 years ago

lavaraja commented 3 years ago

Project board link

We are using reaper 2.1.3 version in SIDECAR mode with Cassandra 3.10 version. We see the repair on the cluster failing with below error.

RROR [XXXXXXXXXXXXXXX:8f9ac3a0-765e-11eb-8d98-6feda3eea4ff:8f9b86f3-765e-11eb-8d98-6feda3eea4ff] i.c.s.RepairRunner - Executing SegmentRunner failed java.lang.AssertionError: Could not release lead on segment 8f9b86f3-765e-11eb-8d98-6feda3eea4ff at io.cassandrareaper.storage.CassandraStorage.releaseLead(CassandraStorage.java:1564) at io.cassandrareaper.service.SegmentRunner.releaseLead(SegmentRunner.java:1233) at io.cassandrareaper.service.SegmentRunner.run(SegmentRunner.java:183) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:117) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:38) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ERROR [XXXXXXXXXXXXXXXX:8f9ac3a0-765e-11eb-8d98-6feda3eea4ff:8f9d34cb-765e-11eb-8d98-6feda3eea4ff] i.c.s.RepairRunner - Executing SegmentRunner failed com.datastax.driver.core.exceptions.InvalidQueryException: SERIAL is not supported as conditional update commit consistency. Use ANY if you mean "make sure it is accepted but I don't care how many replicas commit it for non-SERIAL reads" at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58) at io.cassandrareaper.storage.CassandraStorage.releaseLead(CassandraStorage.java:1559) at io.cassandrareaper.service.SegmentRunner.releaseLead(SegmentRunner.java:1233) at io.cassandrareaper.service.SegmentRunner.run(SegmentRunner.java:183) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:117) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:38) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: SERIAL is not supported as conditional update commit consistency. Use ANY if you mean "make sure it is accepted but I don't care how many replicas commit it for non-SERIAL reads"

┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: REAP-118

adejanovski commented 3 years ago

Hi,

that issue was fixed in the latest 2.2.0. Setting the LWTs as idempotent would allow speculative executions/retries at some funny consistency levels.

Cheers

lavaraja commented 3 years ago

Thank you. Will update the reaper version and validate the same.

lavaraja commented 3 years ago

@adejanovski we have updated to latest reaper version. Now we see below errors in reaper logs. We are running reaper in SIDECAR mode.

WARN [main] i.c.ReaperApplication - Reaper is ready to get things done! ERROR [ashburnprodrealtimepricing:1104e680-8633-11eb-a53e-4745e6e41c1b] i.c.s.RepairRunner - Executing SegmentRunner failed com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during CAS write query at consistency SERIAL (5 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:85) at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:23) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35) at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293) at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58) at io.cassandrareaper.storage.CassandraStorage.lockRunningRepairsForNodes(CassandraStorage.java:1988) at io.cassandrareaper.service.SegmentRunner.takeLead(SegmentRunner.java:868) at io.cassandrareaper.service.SegmentRunner.run(SegmentRunner.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:117) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:38) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during CAS write query at consistency SERIAL (5 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:113) at com.datastax.driver.core.Responses$Error.asException(Responses.java:167) at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:651) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1287) at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1205) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at com.datastax.driver.core.InboundTrafficMeter.channelRead(InboundTrafficMeter.java:38) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1304) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:921) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:135) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 common frames omitted Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during CAS write query at consistency SERIAL (5 replica were required but only 0 acknowledged the write) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:87) at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:65) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:297) at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:268) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88) ... 25 common frames omitted

Also we see below message in current repair progress tab.

##########

ID 1104e680-8633-11eb-a53e-4745e6e41c1b
Owner lavaraja
Cause test reapir
Last event All nodes are busy or have too many pending compactions for the remaining candidate segments.
Start time March 16, 2021 2:09 PM
End time  
Pause time  
Duration 51 minutes 50 seconds
Segment count 667
Segment repaired 15
Intensity 0.8999999761581421
Repair parallelism PARALLEL
Incremental repair false
Repair threads 3
Nodes  

#######################

Regards, Lavaraja.

Roguelazer commented 2 years ago

I'm also seeing this sometimes in reaper 3.1.1 against Cassandra 4.0.3 (also in sidecar mode).

ERROR  [2022-03-22 22:45:12,170] [RepairRunner] i.c.s.RepairRunner - Executing SegmentRunner failed
com.datastax.driver.core.exceptions.InvalidQueryException: SERIAL is not supported as conditional update commit consistency. Use ANY if you mean "make sure it is accepted but I don't care how many replicas commit it for non-SERIAL reads"
    at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:50)
    at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
    at io.cassandrareaper.storage.CassandraStorage.takeLead(CassandraStorage.java:1578)
    at io.cassandrareaper.storage.CassandraStorage.takeLead(CassandraStorage.java:1572)
    at io.cassandrareaper.service.SegmentRunner.takeLead(SegmentRunner.java:861)
    at io.cassandrareaper.service.SegmentRunner.run(SegmentRunner.java:157)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: SERIAL is not supported as conditional update commit consistency. Use ANY if you mean "make sure it is accepted but I don't care how many replicas commit it for non-SERIAL reads"
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:181)
    at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:215)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:235)
    at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:61)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:1011)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:814)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1290)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1208)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:93)
    at com.datastax.driver.core.InboundTrafficMeter.channelRead(InboundTrafficMeter.java:38)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:719)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    ... 1 common frames omitted
pabloclaudino commented 2 years ago

@adejanovski I am also getting this error with reaper 3.0.0 with a cass 3.11.11 as storage

Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during CAS write query at consistency SERIAL (5 replica were required but only 0 acknowledged the write)

What is weirder is that we set the consistency to LOCAL_SERIAL in the configuration yaml

  queryOptions:
    consistencyLevel: LOCAL_QUORUM
    serialConsistencyLevel: LOCAL_SERIAL

Any idea on why it is not being taken in consideration or maybe this is not the right parameter?

jbarotin commented 2 years ago

Hi, I've got this error too, my config is following :

I can try some tests in order to provide more information to help fix this issue.