spring-projects / spring-data-cassandra

Provides support to increase developer productivity in Java when using Apache Cassandra. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-cassandra/
Apache License 2.0
373 stars 307 forks source link

Crac Support - Lifecycle Implementation #1486

Open michaelmcfadyensky opened 3 months ago

michaelmcfadyensky commented 3 months ago

To be able to use Crac with spring boot applications, all open connections need to be closed before a checkpoint is taken and then they need to be re-established after a checkpoint is restored.

Spring framework has added in support for automatically stopping any implementations of Lifecycle before the checkpoint and starting them after the checkpoint restore. (https://github.com/spring-projects/spring-framework/issues/29921)

There has been several issues/PRs raised to provide support for this for kafka, web servers, redis, jdbc etc (https://github.com/spring-projects/spring-kafka/issues/2760) however there is currently no support for cassandra.

Currently, if you attempt to take a checkpoint of a spring cassandra app, it will result in the following error.

jdk.internal.crac.CheckpointException
        at java.base/jdk.internal.crac.Core.checkpointRestore1(Core.java:122)
        at java.base/jdk.internal.crac.Core.checkpointRestore(Core.java:246)
        at java.base/jdk.internal.crac.Core.checkpointRestoreInternal(Core.java:262)
        Suppressed: jdk.internal.crac.impl.CheckpointOpenSocketException: tcp localAddr 172.17.0.3 localPort 52866 remoteAddr 172.17.0.5 remotePort 9042
                at java.base/jdk.internal.crac.Core.translateJVMExceptions(Core.java:91)
                at java.base/jdk.internal.crac.Core.checkpointRestore1(Core.java:145)
                ... 2 more

(i've condensed the full error log as its very long)

Requirements

christophstrobl commented 3 months ago

Thank you for reaching out. The cassandra driver tries to connect to the db early which opens the socket. Changes contained in #1485 and proposed via spring-projects/spring-boot#39948 allow to defer the actual connect, which in turn may allow to create a checkpoint as long as no interaction with the database happens. At the moment there's no out of the box solution that allows to capture an in flight snapshot with the CqlSession being initialized. You may want to give the spring-cloud @RefreshScope a try.