thelastpickle / cassandra-reaper

Automated Repair Awesomeness for Apache Cassandra
http://cassandra-reaper.io/
Apache License 2.0
487 stars 217 forks source link

Errors with cassandra storage after upgrading to 2.0.0 #807

Closed nisc-acooper closed 4 years ago

nisc-acooper commented 4 years ago

We just attempted an upgrade to 2.0.0 using the docker container running in kubernetes. We use cassandra storage backend, which was successful up until this upgrade. The pod appears to dump the following every 10 seconds. This happens across multiple restarts of the pod

WARN [2019-12-06 20:15:38,520] [main] i.c.s.CassandraStorage - Customization of cassandra's retry policy is not supported and will be overridden INFO [2019-12-06 20:15:38,520] [main] c.d.d.c.ClockFactory - Using native clock to generate timestamps. INFO [2019-12-06 20:15:38,563] [main] c.d.d.c.p.DCAwareRoundRobinPolicy - Using data-center name 'datacenter1' for DCAwareRoundRobinPolicy (if this is incorrect, please provide the correct datacenter name with DCAwareRoundRobinPolicy constructor) INFO [2019-12-06 20:15:38,564] [main] c.d.d.c.Cluster - New Cassandra host XXXXXXX INFO [2019-12-06 20:15:38,564] [main] c.d.d.c.Cluster - New Cassandra host XXXXXXX INFO [2019-12-06 20:15:38,564] [main] c.d.d.c.Cluster - New Cassandra host XXXXXXX WARN [2019-12-06 20:15:38,567] [main] c.d.d.c.CodecRegistry - Ignoring codec DateTimeCodec [timestamp <-> org.joda.time.DateTime] because it collides with previously registered codec DateTimeCodec [timestamp <-> org.joda.time.DateTime] INFO [2019-12-06 20:15:38,592] [main] o.c.c.m.MigrationRepository - Found 8 migration scripts INFO [2019-12-06 20:15:38,592] [main] i.c.s.CassandraStorage - Keyspace reaper_db already at schema version 23 ERROR [2019-12-06 20:15:38,615] [main] i.c.ReaperApplication - Storage is not ready yet, trying again to connect shortly... com.datastax.driver.core.exceptions.InvalidQueryException: Undefined column name nodes at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:49) at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35) at com.datastax.driver.core.AbstractSession.prepare(AbstractSession.java:86) at io.cassandrareaper.storage.CassandraStorage.prepareStatements(CassandraStorage.java:434) at io.cassandrareaper.storage.CassandraStorage.<init>(CassandraStorage.java:228) at io.cassandrareaper.ReaperApplication.initializeStorage(ReaperApplication.java:439) at io.cassandrareaper.ReaperApplication.tryInitializeStorage(ReaperApplication.java:310) at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:172) at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:91) at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:43) at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:87) at io.dropwizard.cli.Cli.run(Cli.java:78) at io.dropwizard.Application.run(Application.java:93) at io.cassandrareaper.ReaperApplication.main(ReaperApplication.java:110) Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Undefined column name nodes at com.datastax.driver.core.Responses$Error.asException(Responses.java:181)

Is there a missing schema migration that we need to apply?

adejanovski commented 4 years ago

Hi, the stack trace seems to be pointing at the diagnostic_event_subscription table which is created in migration 23. Could you tell us which version of Reaper you were upgrading from? Can you list the versions in the reaper_db.schema_migration table? Can you check if all Cassandra nodes are agreeing on the schema version by viewing the output of nodetool describecluster? Can you verify the schema of the diagnostic_event_subscription table? Does it match this? Ultimately, you can drop the table and recreate it according to the above link.

adejanovski commented 4 years ago

It just striked me that you may be upgrading from a 1.5 beta version of Reaper, where existed an old version of migration 23, which had a different schema for the diagnostic_event_subscription table. As you can see here, the include_nodes was renamed to nodes at some point. Your solution here will be to drop and recreate the table as suggested before. Let me know if that's the case.

berkaybuharali commented 4 years ago

I am also trying to upgrade reaper from 1.3.0 version backend Cassandra storage.

When I try both 2.0.0 or 1.4.8, I get

org.cognitor.cassandra.migration.MigrationException: Error during migration of script 017_add_custom_jmx_port.cql while executing 'ALTER TABLE cluster ADD properties text;'
    at org.cognitor.cassandra.migration.Database.execute(Database.java:269)
    at java.util.Collections$SingletonList.forEach(Collections.java:4822)
    at org.cognitor.cassandra.migration.MigrationTask.migrate(MigrationTask.java:68)
    at io.cassandrareaper.storage.CassandraStorage.migrate(CassandraStorage.java:303)
    at io.cassandrareaper.storage.CassandraStorage.initializeAndUpgradeSchema(CassandraStorage.java:268)
    at io.cassandrareaper.storage.CassandraStorage.<init>(CassandraStorage.java:227)
    at io.cassandrareaper.ReaperApplication.initializeStorage(ReaperApplication.java:439)
    at io.cassandrareaper.ReaperApplication.tryInitializeStorage(ReaperApplication.java:310)
    at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:172)
    at io.cassandrareaper.ReaperApplication.run(ReaperApplication.java:91)
    at io.dropwizard.cli.EnvironmentCommand.run(EnvironmentCommand.java:43)
    at io.dropwizard.cli.ConfiguredCommand.run(ConfiguredCommand.java:87)
    at io.dropwizard.cli.Cli.run(Cli.java:78)
    at io.dropwizard.Application.run(Application.java:93)
    at io.cassandrareaper.ReaperApplication.main(ReaperApplication.java:110)
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid column name properties because it conflicts with an existing column
    at com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:49)
    at com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:35)
    at com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:293)
    at com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:58)
    at org.cognitor.cassandra.migration.Database.executeStatement(Database.java:277)
    at org.cognitor.cassandra.migration.Database.execute(Database.java:261)
    ... 14 common frames omitted
Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Invalid column name properties because it conflicts with an existing column
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:181)
    at com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:215)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:228)
    at com.datastax.driver.core.RequestHandler.access$2600(RequestHandler.java:62)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:1005)
    at com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:808)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1233)
    at com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1151)
    at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
    at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
    at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
    at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
    at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1304)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:921)
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:135)
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581)
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498)
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:748)
adejanovski commented 4 years ago

That's interesting. I guess you can work this out:

Not sure how where we messed up in the migration path, but it seems like we did at some point 😅 Let us know how that works out.

adejanovski commented 4 years ago

@nisc-acooper, any update on your problem?

nisc-acooper commented 4 years ago

I just got back to this. Yes we were upgrading from a 1.5.0-beta version. I was able to drop the diagnostic_event_subscription table and recreate using the schema file linked. everything is up at version 2.0. Thanks for the assistance, great project!

adejanovski commented 4 years ago

Thanks

Jayesh-Popat commented 1 year ago

Hi @adejanovski , I am also encountering this problem. however, I am not upgrading rather its a fresh deployment and only on one of the node I am observing same set of error messages and reaper container is constantly restarting. The other two nodes are completely stable though. Any suggestions??

Edit: The workaround mentioned; dropping reaper db keyspace and recreating it does work. I am trying to fix this permananlty.