k8ssandra / management-api-for-apache-cassandra

RESTful / Secure Management Sidecar for Apache Cassandra
Apache License 2.0
72 stars 51 forks source link

SAI indexes not working #274

Closed adejanovski closed 1 year ago

adejanovski commented 1 year ago

It seems like the DSE images we build exhibit a problem with SAI.

With the following manifest:


apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test
  namespace: k8ssandra-operator
spec:
  cassandra:
    serverVersion: 6.8.29
    serverType: dse
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 5Gi
    config:
      jvmOptions:
        heapSize: 1G
    networking:
      hostNetwork: true
    datacenters:
      - metadata:
          name: dc1
        size: 3
    mgmtAPIHeap: 64Mi

Creating a table and then an SAI index on it would result in the following:

test-superuser@cqlsh> create keyspace test WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3};
test-superuser@cqlsh> create table test.test (id int primary key, years list<int>);
test-superuser@cqlsh> create custom index on test.test(years) using 'StorageAttachedIndex';
NoHostAvailable: 

The underlying exception in the system.log file is:

ERROR [mainIOThread-112] 2023-02-27 14:48:54,614  Message.java:890 - Unexpected exception during request; channel = [id: 0x712d84ae, L:/127.0.0.1:9042 - R:/127.0.0.1:41882]
java.lang.NullPointerException: null
    at org.apache.cassandra.index.sai.StorageAttachedIndexGroup.diskUsage(StorageAttachedIndexGroup.java:390)
    at org.apache.cassandra.index.sai.StorageAttachedIndexGroup.totalDiskUsage(StorageAttachedIndexGroup.java:430)
    at io.k8ssandra.metrics.builder.CassandraMetricRegistryListener.onGaugeAdded(CassandraMetricRegistryListener.java:123)
    at com.codahale.metrics.MetricRegistry.notifyListenerOfAddedMetric(MetricRegistry.java:454)
    at com.codahale.metrics.MetricRegistry.onMetricAdded(MetricRegistry.java:448)
    at com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:89)
    at org.apache.cassandra.metrics.CassandraMetricsRegistry.register(CassandraMetricsRegistry.java:129)
    at org.apache.cassandra.index.sai.metrics.TableStateMetrics.<init>(TableStateMetrics.java:32)
    at org.apache.cassandra.index.sai.StorageAttachedIndexGroup.<init>(StorageAttachedIndexGroup.java:82)
    at org.apache.cassandra.index.sai.StorageAttachedIndex.lambda$register$5(StorageAttachedIndex.java:311)
    at org.apache.cassandra.index.SecondaryIndexManager.lambda$registerIndex$28(SecondaryIndexManager.java:1286)
    at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
    at org.apache.cassandra.index.SecondaryIndexManager.registerIndex(SecondaryIndexManager.java:1286)
    at org.apache.cassandra.index.sai.StorageAttachedIndex.register(StorageAttachedIndex.java:311)
    at org.apache.cassandra.index.SecondaryIndexManager.createIndex(SecondaryIndexManager.java:222)
    at org.apache.cassandra.index.SecondaryIndexManager.addIndex(SecondaryIndexManager.java:291)
    at org.apache.cassandra.index.SecondaryIndexManager.reload(SecondaryIndexManager.java:206)
    at org.apache.cassandra.db.ColumnFamilyStore.reload(ColumnFamilyStore.java:419)
    at org.apache.cassandra.schema.SchemaManager.alterTable(SchemaManager.java:1022)
    at org.apache.cassandra.schema.SchemaManager.lambda$alterKeyspace$23(SchemaManager.java:919)
    at java.lang.Iterable.forEach(Iterable.java:75)
    at org.apache.cassandra.schema.SchemaManager.alterKeyspace(SchemaManager.java:919)
    at java.lang.Iterable.forEach(Iterable.java:75)
    at org.apache.cassandra.schema.SchemaManager.merge(SchemaManager.java:903)
    at org.apache.cassandra.schema.SchemaManager.apply(SchemaManager.java:832)
    at org.apache.cassandra.schema.MigrationManager.lambda$applyAndAnnounce$8(MigrationManager.java:385)
    at io.reactivex.internal.operators.single.SingleFromCallable.subscribeActual(SingleFromCallable.java:44)
    at io.reactivex.Single.subscribe(Single.java:3603)
    at io.reactivex.internal.operators.single.SingleMap.subscribeActual(SingleMap.java:34)
    at io.reactivex.Single.subscribe(Single.java:3603)
    at io.reactivex.internal.operators.single.SingleResumeNext.subscribeActual(SingleResumeNext.java:39)
    at io.reactivex.Single.subscribe(Single.java:3603)
    at io.reactivex.internal.operators.single.SingleDelayWithCompletable$OtherObserver.onComplete(SingleDelayWithCompletable.java:69)
    at io.reactivex.internal.disposables.EmptyDisposable.complete(EmptyDisposable.java:68)
    at io.reactivex.internal.operators.completable.CompletableEmpty.subscribeActual(CompletableEmpty.java:27)
    at io.reactivex.Completable.subscribe(Completable.java:2302)
    at io.reactivex.internal.operators.single.SingleDelayWithCompletable.subscribeActual(SingleDelayWithCompletable.java:36)
    at io.reactivex.Single.subscribe(Single.java:3603)
    at io.reactivex.internal.operators.single.SingleDefer.subscribeActual(SingleDefer.java:43)
    at io.reactivex.Single.subscribe(Single.java:3603)
    at org.apache.cassandra.utils.flow.RxThreads$1SubscribeOn.lambda$subscribeActual$0(RxThreads.java:51)
    at org.apache.cassandra.concurrent.TPCRunnable.run(TPCRunnable.java:101)
    at org.apache.cassandra.concurrent.IOScheduler$PooledTaskWorker.run(IOScheduler.java:319)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.lang.Thread.run(Thread.java:750)
    at org.apache.cassandra.utils.concurrent.InlinedThreadLocalThread.run(InlinedThreadLocalThread.java:251)
    at org.apache.cassandra.concurrent.IOThread.run(IOThread.java:46)
burmanm commented 1 year ago

Note, these lines are probably from a older mgmt-api version (the line itself does not match function names in the current source code)

burmanm commented 1 year ago

So, adding another note here. We need a workaround to prevent registering these broken metrics (a blacklist of some sort), since calling the getValue() will cause the crash deeper in the DB which we don't want.

That list must also be tied to the DSE versions, so that later versions which have working implementation are able to export them.