shotover / shotover-proxy

L7 data-layer proxy
https://docs.shotover.io
Apache License 2.0
87 stars 18 forks source link

Attempt to fix kafka cluster test intermittent failure #1827

Closed rukai closed 1 week ago

rukai commented 1 week ago

This PR attempts to fix an intermittent failure:

 called `Result::unwrap()` on an `Err` value: org.apache.kafka.common.errors.UnknownTopicOrPartitionException: 

Stack backtrace:
   0: anyhow::error::<impl core::convert::From<E> for anyhow::Error>::from
             at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.93/src/backtrace.rs:27:14
   1: <core::result::Result<T,F> as core::ops::try_trait::FromResidual<core::result::Result<core::convert::Infallible,E>>>::from_residual
             at /rustc/f6e511eec7342f59a25f7c0534f1dbea00d01b14/library/core/src/result.rs:1987:27
   2: test_helpers::connection::java::Value::call_async_fallible::{{closure}}
             at /home/runner/work/shotover-proxy/shotover-proxy/test-helpers/src/connection/java.rs:332:24
   3: test_helpers::connection::java::Value::call_async::{{closure}}
             at /home/runner/work/shotover-proxy/shotover-proxy/test-helpers/src/connection/java.rs:326:46
   4: test_helpers::connection::kafka::java::KafkaAdminJava::describe_configs::{{closure}}
             at /home/runner/work/shotover-proxy/shotover-proxy/test-helpers/src/connection/kafka/java.rs:639:14
   5: test_helpers::connection::kafka::KafkaAdmin::describe_configs::{{closure}}
             at /home/runner/work/shotover-proxy/shotover-proxy/test-helpers/src/connection/kafka/mod.rs:602:72
   6: lib::kafka_int_tests::test_cases::admin_setup::{{closure}}
             at ./tests/kafka_int_tests/test_cases.rs:119:10
   7: lib::kafka_int_tests::test_cases::standard_test_suite_base::{{closure}}
             at ./tests/kafka_int_tests/test_cases.rs:1556:37
   8: lib::kafka_int_tests::test_cases::cluster_test_suite::{{closure}}
             at ./tests/kafka_int_tests/test_cases.rs:1941:50
   9: lib::kafka_int_tests::cluster_2_racks_multi_shotover::{{closure}}
             at ./tests/kafka_int_tests/mod.rs:587:57
  10: lib::kafka_int_tests::cluster_2_racks_multi_shotover::case_2_java::{{closure}}
             at ./tests/kafka_int_tests/mod.rs:562:1

The issue only occurs in CI and is fairly rare, I cannot reproduce it locally.

My thinking is that altering the number of partitions and then immediately fetching the config might be causing the config fetch to error out because the altered topic is in some kind of in between state.

Maybe altering a different topic will resolve the issue. And if it doesnt resolve the issue it at least rules out one possibility. So lets see how it goes.

codspeed-hq[bot] commented 1 week ago

CodSpeed Performance Report

Merging #1827 will not alter performance

Comparing rukai:intermittent_failure_fix_attempt (265400f) with main (2bb4ed3)

Summary

✅ 38 untouched benchmarks