Open emerkle826 opened 1 year ago
The stack trace will be a little off becuase of local changes to get some debugging info, but when adding a dump of the jobId, this is what I see:
jobID: 10
WARN [2023-09-07 16:11:27,005] io.cassandrareaper.service.SegmentRunner: Failed to connect to a coordinator node for segment 3032ba71-4d99-11ee-afee-a75cf7d71d60
! java.lang.StringIndexOutOfBoundsException: String index out of range: -5
! at java.base/java.lang.String.substring(String.java:1841)
! at io.cassandrareaper.management.http.HttpCassandraManagementProxy.triggerRepair(HttpCassandraManagementProxy.java:329)
! at io.cassandrareaper.service.SegmentRunner.runRepair(SegmentRunner.java:320)
! at io.cassandrareaper.service.SegmentRunner.run(SegmentRunner.java:243)
! at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
! at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
! at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
! at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
! at com.codahale.metrics.InstrumentedScheduledExecutorService$InstrumentedRunnable.run(InstrumentedScheduledExecutorService.java:241)
! at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
! at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
! at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
! at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
! at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
! at com.codahale.metrics.InstrumentedThreadFactory$InstrumentedRunnable.run(InstrumentedThreadFactory.java:66)
! at java.base/java.lang.Thread.run(Thread.java:829)
here the jobId is simply "10", so the substring(7) bit is past the length of the string.
Project board link
Running scenario Scenario Outline: Create a cluster and a repair run and delete them will fail on
triggerRepair
once thegetRangeToEndpointMap
method is implemented. The issue appears to be that thejobId
returned from the v2 repair endpoint returns a String of a simple 1-2 digit number, which causes this substring call to throw an out of bounds exception.Chatting with burmanm , it seems the repair call to Management API needs to specify that notifications should be sent/true.
I realize testing this without the implementation for
getRangeToEndpointMap
might be a challenge, so I will make sure the PRs for that work are up ASAP.┆Issue is synchronized with this Jira Story by Unito ┆Issue Number: REAP-29