stackabletech / druid-operator

An Operator for Apache Druid for Stackable Data Platform
Other
10 stars 0 forks source link

Druid 30.0.0 ingestion task failure research #595

Closed maltesander closed 1 month ago

maltesander commented 2 months ago

We planned on having Druid 30.0.0 in the 24.7 release, which we now bailed on and marked 30.0.0 as experimental.

The reason were problems with integration tests and demos, especially regarding data ingestion.

Definiton of Done

Ideally the problem is identified and fixed. However, one possible outcome of this issue could be that we are not able to identifiy and fix the problem within 2-3 days. In that case we at least want to create an upsteam issue at Druid with a instructions to reproduce the issue.

Problem description

In the HDFS deep storage integration test e.g. the ingestion task fails with a null pointer exception.

2024-07-18T07:46:00,275 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]
2024-07-18T07:46:00,275 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]
2024-07-18T07:46:01,280 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Found intervals and shardSpecs in 2,510ms
2024-07-18T07:46:01,281 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - Using time chunk lock since forceTimeChunkLock is [true] and mode is [REPLACE_LEGACY].
2024-07-18T07:46:04,183 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing [1] segments in background
2024-07-18T07:46:04,184 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Pushing segments: [wikipedia-2015-09-12_2015-09-12T00:00:00.000Z_2015-09-13T00:00:00.000Z_2024-07-18T07:46:01.359Z]
2024-07-18T07:46:04,186 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Persisted rows[39,244] and (estimated) bytes[27,119,548]
2024-07-18T07:46:04,644 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in BUILD_SEGMENTS.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:1028) ~[druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:519) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSequential(ParallelIndexSupervisorTask.java:1241) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:573) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-30.0.0.jar:30.0.0]
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:592) ~[guava-32.0.1-jre.jar:?]
    at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:571) ~[guava-32.0.1-jre.jar:?]
    at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91) ~[guava-32.0.1-jre.jar:?]
    at org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver.pushAndClear(BatchAppenderatorDriver.java:152) ~[druid-server-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.realtime.appenderator.BatchAppenderatorDriver.pushAllAndClear(BatchAppenderatorDriver.java:135) ~[druid-server-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.InputSourceProcessor.process(InputSourceProcessor.java:136) ~[druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:910) ~[druid-indexing-service-30.0.0.jar:30.0.0]
    ... 13 more
Caused by: java.lang.NullPointerException
    at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
    at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273) ~[fastutil-8.5.13.jar:?]
    at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257) ~[fastutil-8.5.13.jar:?]
    at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:225) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.persistHydrant(AppenderatorImpl.java:1559) ~[druid-server-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl.access$100(AppenderatorImpl.java:120) ~[druid-server-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.realtime.appenderator.AppenderatorImpl$2.call(AppenderatorImpl.java:667) ~[druid-server-30.0.0.jar:30.0.0]
    ... 6 more
2024-07-18T07:46:04,661 WARN [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - handler[index_parallel_wikipedia-2015-09-12_aalngccn_2024-07-18T07:45:51.005Z] not currently registered, ignoring.
2024-07-18T07:46:04,664 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_parallel_wikipedia-2015-09-12_aalngccn_2024-07-18T07:45:51.005Z",
  "status" : "FAILED",
  "duration" : 5905,
  "errorMsg" : "java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException\n...",
  "location" : {
    "host" : null,
    "port" : -1,
    "tlsPort" : -1
  }
}

@xeniape reported another null pointer exception during the nifi-kafka-druid-earthquake demo test:

2024-07-17T12:03:35,185 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception while running task.
java.lang.RuntimeException: java.lang.NullPointerException
    at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:619) ~[druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:277) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:153) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-30.0.0.jar:30.0.0]
    at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-30.0.0.jar:30.0.0]
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
    Suppressed: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:248) ~[druid-server-30.0.0.jar:30.0.0]
        at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:790) ~[druid-indexing-service-30.0.0.jar:30.0.0]
        at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:277) [druid-indexing-service-30.0.0.jar:30.0.0]
        at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.runTask(SeekableStreamIndexTask.java:153) [druid-indexing-service-30.0.0.jar:30.0.0]
        at org.apache.druid.indexing.common.task.AbstractTask.run(AbstractTask.java:179) [druid-indexing-service-30.0.0.jar:30.0.0]
        at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:478) [druid-indexing-service-30.0.0.jar:30.0.0]
        at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:450) [druid-indexing-service-30.0.0.jar:30.0.0]
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131) [guava-32.0.1-jre.jar:?]
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75) [guava-32.0.1-jre.jar:?]
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82) [guava-32.0.1-jre.jar:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
        at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
    Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:592) ~[guava-32.0.1-jre.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:571) ~[guava-32.0.1-jre.jar:?]
        at com.google.common.util.concurrent.FluentFuture$TrustedFuture.get(FluentFuture.java:91) ~[guava-32.0.1-jre.jar:?]
        at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:240) ~[druid-server-30.0.0.jar:30.0.0]
        ... 12 more
    Caused by: java.lang.NullPointerException
        at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
        at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273) ~[fastutil-core-8.5.13.jar:?]
        at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257) ~[fastutil-core-8.5.13.jar:?]
        at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:225) ~[druid-processing-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.persistHydrant(StreamAppenderator.java:1655) ~[druid-server-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.access$200(StreamAppenderator.java:119) ~[druid-server-30.0.0.jar:30.0.0]
        at org.apache.druid.segment.realtime.appenderator.StreamAppenderator$2.call(StreamAppenderator.java:680) ~[druid-server-30.0.0.jar:30.0.0]
        ... 6 more
Caused by: java.lang.NullPointerException
    at java.base/java.util.Objects.requireNonNull(Objects.java:209) ~[?:?]
    at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273) ~[fastutil-core-8.5.13.jar:?]
    at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257) ~[fastutil-core-8.5.13.jar:?]
    at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:225) ~[druid-processing-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.persistHydrant(StreamAppenderator.java:1655) ~[druid-server-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.access$200(StreamAppenderator.java:119) ~[druid-server-30.0.0.jar:30.0.0]
    at org.apache.druid.segment.realtime.appenderator.StreamAppenderator$2.call(StreamAppenderator.java:680) ~[druid-server-30.0.0.jar:30.0.0]
    ... 6 more

and failing logs

2024-07-17T14:43:01,124 ERROR [[index_kafka_earthquakes_c6d22f2137390e3_kmennhje]-appenderator-persist] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Persist failed, dying
2024-07-17T14:43:01,131 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.

whereas a working tasks persists:

2024-07-18T06:04:36,902 INFO [[index_kafka_earthquakes_6c6ac108e1afd93_oeciigko]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data for segment[earthquakes_1960-01-01T00:00:00.000Z_1961-01-01T00:00:00.000Z_2024-07-17T10:00:10.461Z] spill[0] to disk in [137] ms (1,513 rows).
2024-07-18T06:04:37,038 INFO [[index_kafka_earthquakes_6c6ac108e1afd93_oeciigko]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data for segment[earthquakes_1980-01-01T00:00:00.000Z_1981-01-01T00:00:00.000Z_2024-07-17T10:00:24.050Z] spill[0] to disk in [66] ms (2,297 rows).
2024-07-18T06:04:37,378 INFO [[index_kafka_earthquakes_6c6ac108e1afd93_oeciigko]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.StreamAppenderator - Flushed in-memory data for segment[earthquakes_1979-01-01T00:00:00.000Z_1980-01-01T00:00:00.000Z_2024-07-17T10:00:23.160Z] spill[0] to disk in [337] ms (18,953 rows).

@xeniape also found out that unit tests fail when building 30.0.0 from source

3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesForceVirtualContextCannotVectorize
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesLong
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesMultiValueString
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesMultiValueStringNotDictionaryEncoded
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesMultiValueStringUnknown
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesSingleValueString
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesSingleValueStringNotDictionaryEncoded
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.apache.druid.segment.virtual.VectorizedVirtualColumnTest.testTimeseriesTrueVirtualContextDefault
3638.1 [ERROR]   Run 1: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 2: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 3: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [ERROR]   Run 4: VectorizedVirtualColumnTest.setup:112 ? NullPointer
3638.1 [INFO] 
3638.1 [ERROR] org.junit.runner.manipulation.Filter.null
3638.1 [ERROR]   Run 1: Filter ?  No tests found matching Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionTimeseriesAggregation]/[test:%5B1%5D false (testNumericEvolutionTimeseriesAggregation)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testHyperUniqueEvolutionTimeseries]/[test:%5B1%5D false (testHyperUniqueEvolutionTimeseries)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testHyperUniqueEvolutionTimeseries]/[test:%5B0%5D true (testHyperUniqueEvolutionTimeseries)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionFiltering]/[test:%5B0%5D true (testNumericEvolutionFiltering)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionFiltering]/[test:%5B1%5D false (testNumericEvolutionFiltering)] OR Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:testNumericEvolutionTimeseriesAggregation]/[test:%5B0%5D true (testNumericEvolutionTimeseriesAggregation)] from org.junit.vintage.engine.descriptor.RunnerRequest@230b4acb
3638.1 [ERROR]   Run 2: Filter ?  No tests found matching Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:initializationError(org.junit.runner.manipulation.Filter)] from org.junit.vintage.engine.descriptor.RunnerRequest@37fd1b30
3638.1 [ERROR]   Run 3: Filter ?  No tests found matching Unique ID [engine:junit-vintage]/[runner:org.apache.druid.query.SchemaEvolutionTest]/[test:initializationError(org.junit.runner.manipulation.Filter)] from org.junit.vintage.engine.descriptor.RunnerRequest@6b1c6d39
3638.1 [INFO] 
3638.1 [INFO] 
3638.1 [ERROR] Tests run: 252314, Failures: 83, Errors: 82807, Skipped: 17828
3638.1 [INFO] 
3638.1 [INFO] ------------------------------------------------------------------------
3638.1 [INFO] Reactor Summary for Druid 30.0.0:
3638.1 [INFO] 
3638.1 [INFO] Druid .............................................. SUCCESS [01:55 min]
3638.1 [INFO] druid-processing ................................... FAILURE [58:09 min]
3638.1 [INFO] druid-aws-common ................................... SKIPPED
3638.1 [INFO] druid-gcp-common ................................... SKIPPED

Tests from source with our mvn profile / command:

[INFO] Running org.apache.druid.query.timeboundary.TimeBoundaryQueryRunnerTest
[INFO] Tests run: 72, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.099 s -- in org.apache.druid.query.timeboundary.TimeBoundaryQueryRunnerTest

and failure for our patched builds:

[ERROR] org.apache.druid.segment.ConciseBitmapIndexMergerV9Test.testPersistNullColumnSkipping[100: metric compression=none, dimension compression=lzf, long encoding=longs, segment write-out medium=org.apache.druid.segment.writeout.OffHeapMemorySegmentWriteOutMediumFactory@62c229bb] -- Time elapsed: 0.001 s <<< ERROR!
#43 653.2 java.lang.NullPointerException
#43 653.2   at java.base/java.util.Objects.requireNonNull(Objects.java:209)
#43 653.2   at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.add(Object2IntRBTreeMap.java:273)
#43 653.2   at it.unimi.dsi.fastutil.objects.Object2IntRBTreeMap.put(Object2IntRBTreeMap.java:257)
#43 653.2   at org.apache.druid.segment.SortedDimensionDictionary.<init>(SortedDimensionDictionary.java:45)
#43 653.2   at org.apache.druid.segment.DimensionDictionary.sort(DimensionDictionary.java:265)
#43 653.2   at org.apache.druid.segment.DictionaryEncodedColumnIndexer.sortedLookup(DictionaryEncodedColumnIndexer.java:243)
#43 653.2   at org.apache.druid.segment.DictionaryEncodedColumnIndexer.getActualValue(DictionaryEncodedColumnIndexer.java:250)
#43 653.2   at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:89)
#43 653.2   at org.apache.druid.segment.DictionaryEncodedColumnIndexer$1.get(DictionaryEncodedColumnIndexer.java:79)
#43 653.2   at org.apache.druid.segment.DictionaryEncodedColumnMerger.allNull(DictionaryEncodedColumnMerger.java:533)
#43 653.2   at org.apache.druid.segment.DictionaryEncodedColumnMerger.writeMergedValueDictionary(DictionaryEncodedColumnMerger.java:149)
#43 653.2   at org.apache.druid.segment.IndexMergerV9.writeDimValuesAndSetupDimConversion(IndexMergerV9.java:816)
#43 653.2   at org.apache.druid.segment.IndexMergerV9.makeIndexFiles(IndexMergerV9.java:216)
#43 653.2   at org.apache.druid.segment.IndexMergerV9.merge(IndexMergerV9.java:1155)
#43 653.2   at org.apache.druid.segment.IndexMergerV9.multiphaseMerge(IndexMergerV9.java:972)
#43 653.2   at org.apache.druid.segment.IndexMergerV9.persist(IndexMergerV9.java:876)
#43 653.2   at org.apache.druid.segment.IndexMerger.persist(IndexMerger.java:203)
#43 653.2   at org.apache.druid.segment.IndexMergerTestBase.testPersistNullColumnSkipping(IndexMergerTestBase.java:1736)
#43 653.2   at java.base/java.lang.reflect.Method.invoke(Method.java:569)
#43 653.2   at org.apache.druid.segment.CloserRule$1.evaluate(CloserRule.java:52)

The ingestion tasks seem very flaky, although more on the failing side. Ingesting a simple CSV file sometimes work and sometimes doesnt.

The problem is around the segments and persting data and should be further researched to support Druid 30.0.0

I tried remote debugging but did not spent too much time on it. Just add the line

        -agentlib:jdwp=transport=dt_socket,address=*:5005,server=y,suspend=y

to the jvm config and connect to localhost on port 5005 to debug the process.

@labrenbe found an unrelated error in OIDC callbacks:

2024-08-14T15:01:43,208 WARN [qtp1268743900-74] org.eclipse.jetty.server.HttpChannel - /druid-ext/druid-pac4j/callback
java.lang.NoSuchMethodError: 'net.minidev.json.JSONObject com.nimbusds.jwt.JWTClaimsSet.toJSONObject()'
    at com.nimbusds.openid.connect.sdk.claims.IDTokenClaimsSet.<init>(IDTokenClaimsSet.java:238) ~[?:?]
    at com.nimbusds.openid.connect.sdk.validators.IDTokenValidator.toIDTokenClaimsSet(IDTokenValidator.java:339) ~[?:?]
    at com.nimbusds.openid.connect.sdk.validators.IDTokenValidator.validate(IDTokenValidator.java:289) ~[?:?]
    at com.nimbusds.openid.connect.sdk.validators.IDTokenValidator.validate(IDTokenValidator.java:224) ~[?:?]
    at org.pac4j.oidc.profile.creator.TokenValidator.validate(TokenValidator.java:103) ~[?:?]
    at org.pac4j.oidc.profile.creator.OidcProfileCreator.create(OidcProfileCreator.java:93) ~[?:?]
    at org.pac4j.oidc.profile.creator.OidcProfileCreator.create(OidcProfileCreator.java:45) ~[?:?]
    at org.pac4j.core.client.BaseClient.retrieveUserProfile(BaseClient.java:119) ~[?:?]
    at org.pac4j.core.client.BaseClient.getUserProfile(BaseClient.java:99) ~[?:?]
    at org.pac4j.core.engine.DefaultCallbackLogic.perform(DefaultCallbackLogic.java:88) ~[?:?]
    at org.apache.druid.security.pac4j.Pac4jFilter.doFilter(Pac4jFilter.java:92) ~[?:?]
    at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-30.0.0.jar:30.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.apache.druid.security.basic.authentication.BasicHTTPAuthenticator$BasicHTTPAuthenticationFilter.doFilter(BasicHTTPAuthenticator.java:172) ~[?:?]
    at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:59) ~[druid-server-30.0.0.jar:30.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:77) ~[druid-server-30.0.0.jar:30.0.0]
    at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[jetty-servlet-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:59) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) ~[jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) [jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [jetty-server-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [jetty-io-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [jetty-util-9.4.54.v20240208.jar:9.4.54.v20240208]
    at java.base/java.lang.Thread.run(Thread.java:840) [?:?]

Related issues (very old): https://github.com/apache/druid/issues/7662 https://github.com/apache/druid/issues/8835

sbernauer commented 1 month ago

Moving back to in progress, as we want to create one upstream Druid issues for the nimbus-jose-jwt problem

labrenbe commented 1 month ago

Created an upstream issue: https://github.com/apache/druid/issues/16920

lfrancke commented 2 weeks ago

Thanks. This means we can move to Druid 30 in 24.11?

sbernauer commented 2 weeks ago

I'd say yes :)