quickwit-oss / quickwit

Cloud-native search engine for observability. An open-source alternative to Datadog, Elasticsearch, Loki, and Tempo.
https://quickwit.io
Other
7.31k stars 298 forks source link

Fix flaky test test_ingester_closes_idle_shards #5129

Closed fulmicoton closed 4 weeks ago

fulmicoton commented 4 weeks ago

https://github.com/quickwit-oss/quickwit/actions/runs/9512825063/job/26221639753

        PASS [   0.138s] quickwit-ingest ingest_api_service::tests::test_ingest_request_cost
        PASS [   0.166s] quickwit-ingest ingest_v2::broadcast::tests::test_local_shards_snapshot_diff
        PASS [   0.099s] quickwit-ingest ingest_v2::broadcast::tests::test_make_key
        PASS [   0.052s] quickwit-ingest ingest_v2::broadcast::tests::test_shard_info_serde
        PASS [   0.201s] quickwit-ingest ingest_v2::broadcast::tests::test_parse_key
        PASS [   0.151s] quickwit-ingest ingest_v2::debouncing::tests::test_debounced_get_or_create_open_shards_request
        PASS [   0.095s] quickwit-ingest ingest_v2::debouncing::tests::test_get_or_create_open_shards_request_debouncer
        PASS [   0.303s] quickwit-ingest ingest_v2::debouncing::tests::test_debouncer
        PASS [   0.683s] quickwit-ingest ingest_v2::broadcast::tests::test_local_shards_update_listener
        PASS [   0.205s] quickwit-ingest ingest_v2::fetch::tests::test_fault_tolerant_fetch_stream_error_failover
        PASS [   0.143s] quickwit-ingest ingest_v2::fetch::tests::test_fault_tolerant_fetch_stream_ingester_unavailable_failover
        PASS [   0.828s] quickwit-ingest ingest_v2::broadcast::tests::test_broadcast_local_shards_task
        PASS [   0.139s] quickwit-ingest ingest_v2::fetch::tests::test_fault_tolerant_fetch_stream_shard_not_found
        PASS [   0.142s] quickwit-ingest ingest_v2::fetch::tests::test_fault_tolerant_fetch_stream_open_fetch_stream_error_failover
        PASS [   0.201s] quickwit-ingest ingest_v2::fetch::tests::test_fetch_task_error
        PASS [   0.360s] quickwit-ingest ingest_v2::fetch::tests::test_fetch_task_batch_num_bytes
        PASS [   0.206s] quickwit-ingest ingest_v2::fetch::tests::test_fetch_task_signals_eof
        PASS [   1.942s] quickwit-indexing::failpoints test_failpoint_uploader_panics_right_away
        PASS [   0.139s] quickwit-ingest ingest_v2::fetch::tests::test_multi_fetch_stream
        PASS [   0.129s] quickwit-ingest ingest_v2::fetch::tests::test_retrying_fetch_stream
        PASS [   0.553s] quickwit-ingest ingest_v2::fetch::tests::test_fetch_task_from_position_exclusive
        PASS [   0.552s] quickwit-ingest ingest_v2::fetch::tests::test_fetch_task_happy_path
        PASS [   0.211s] quickwit-ingest ingest_v2::fetch::tests::test_select_preferred_and_failover_ingesters
        PASS [   0.439s] quickwit-ingest ingest_v2::fetch::tests::test_fetch_task_signals_eof_at_beginning
        PASS [   0.254s] quickwit-ingest ingest_v2::ingester::tests::test_check_decommissioning_status
        PASS [   8.523s] quickwit-indexing source::kafka_source::kafka_broker_tests::test_kafka_source
        PASS [   0.448s] quickwit-ingest ingest_v2::idle::tests::test_close_idle_shards_run
        PASS [   0.553s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_broadcasts_local_shards
        PASS [   0.405s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_close_shards
        PASS [   0.411s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_debug_info
        PASS [   0.308s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_init_shards
        PASS [   0.239s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_open_observation_stream
        PASS [   0.542s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_init
        PASS [   0.295s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_open_replication_stream
  TRY 1 FAIL [   0.757s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards

--- TRY 1 STDOUT:        quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards ---

running 1 test
test ingest_v2::ingester::tests::test_ingester_closes_idle_shards ... FAILED

failures:

failures:
    ingest_v2::ingester::tests::test_ingester_closes_idle_shards

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 121 filtered out; finished in 0.59s

--- TRY 1 STDERR:        quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards ---
thread 'ingest_v2::ingester::tests::test_ingester_closes_idle_shards' panicked at quickwit-ingest/src/ingest_v2/ingester.rs:3237:14:
assertion failed: self.shard_state.is_open()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   RETRY 2/3 [         ] quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards
        PASS [   3.118s] quickwit-indexing::failpoints test_failpoint_uploader_panics_after_one_success
        PASS [   0.157s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_deletes_dangling_shard
        PASS [   0.410s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_open_fetch_stream
        PASS [   0.398s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_closes_shard_on_io_error
        PASS [   0.535s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist
        PASS [   0.159s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_shard_closed
        PASS [   0.500s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_empty
        PASS [   0.503s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_rate_limited
        PASS [   0.262s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_resource_exhausted
        PASS [   0.160s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_reset_shards
        PASS [   0.601s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_replicate
        PASS [   0.601s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_persist_replicate_grpc
  TRY 2 FAIL [   0.605s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards

--- TRY 2 STDOUT:        quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards ---

running 1 test
test ingest_v2::ingester::tests::test_ingester_closes_idle_shards ... FAILED

failures:

failures:
    ingest_v2::ingester::tests::test_ingester_closes_idle_shards

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 121 filtered out; finished in 0.54s

--- TRY 2 STDERR:        quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards ---
thread 'ingest_v2::ingester::tests::test_ingester_closes_idle_shards' panicked at quickwit-ingest/src/ingest_v2/ingester.rs:3237:14:
assertion failed: self.shard_state.is_open()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   RETRY 3/3 [         ] quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards
        PASS [   0.185s] quickwit-ingest ingest_v2::models::tests::test_new_replica_shard
        PASS [   0.186s] quickwit-ingest ingest_v2::models::tests::test_new_primary_shard
        PASS [   0.238s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_truncate_shards_deletes_dangling_shards
        PASS [   0.204s] quickwit-ingest ingest_v2::mrecord::tests::test_mrecord_commit_roundtrip
        PASS [   0.210s] quickwit-ingest ingest_v2::models::tests::test_new_solo_shard
        PASS [   0.205s] quickwit-ingest ingest_v2::mrecord::tests::test_mrecord_doc_roundtrip
        PASS [   0.094s] quickwit-ingest ingest_v2::mrecord::tests::test_parse_invalid_mrecord
        PASS [   0.054s] quickwit-ingest ingest_v2::mrecordlog_utils::tests::test_check_enough_capacity
        PASS [   0.746s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_truncate_on_shard_positions_update
        PASS [   0.206s] quickwit-ingest ingest_v2::rate_meter::tests::test_rate_meter
        PASS [   0.845s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_truncate_shards
        PASS [   0.851s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_retain_shards
  TRY 3 FAIL [   0.945s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards

--- TRY 3 STDOUT:        quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards ---

running 1 test
test ingest_v2::ingester::tests::test_ingester_closes_idle_shards ... FAILED

failures:

failures:
    ingest_v2::ingester::tests::test_ingester_closes_idle_shards

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 121 filtered out; finished in 0.68s

--- TRY 3 STDERR:        quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards ---
thread 'ingest_v2::ingester::tests::test_ingester_closes_idle_shards' panicked at quickwit-ingest/src/ingest_v2/ingester.rs:3237:14:
assertion failed: self.shard_state.is_open()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

   Canceling due to test failure: 7 tests still running
        PASS [   0.203s] quickwit-ingest ingest_v2::replication::tests::test_replication_stream_task_replicate
        PASS [   0.205s] quickwit-ingest ingest_v2::replication::tests::test_replication_stream_task_init
        PASS [   0.649s] quickwit-ingest ingest_v2::mrecordlog_utils::tests::test_append_non_empty_doc_batch_io_error
        PASS [   0.398s] quickwit-ingest ingest_v2::replication::tests::test_replication_stream_replicate_errors
        PASS [   0.737s] quickwit-ingest ingest_v2::mrecordlog_utils::tests::test_append_queue_position_range
        PASS [   0.550s] quickwit-ingest ingest_v2::replication::tests::test_replication_task_closes_shard_on_io_error
  TRY 1 SLOW [> 10.000s] quickwit-indexing source::pulsar_source::pulsar_broker_tests::test_partitioned_topic_multi_consumer_ingestion_with_failover
        PASS [   0.538s] quickwit-ingest ingest_v2::replication::tests::test_replication_task_happy_path
        PASS [  13.265s] quickwit-indexing source::pulsar_source::pulsar_broker_tests::test_partitioned_topic_multi_consumer_ingestion_with_failover
------------
     Summary [  68.522s] 910/1629 tests run: 909 passed (2 slow, 2 flaky), 1 failed, 14 skipped
   FLAKY 2/3 [   0.606s] quickwit-control-plane control_plane::tests::test_delete_shard_on_eof
   FLAKY 2/3 [   0.413s] quickwit-control-plane tests::test_scheduler_scheduling_multiple_indexers
  TRY 3 FAIL [   0.945s] quickwit-ingest ingest_v2::ingester::tests::test_ingester_closes_idle_shards
error: test run failed
error: process didn't exit successfully: `/home/runner/.rustup/toolchains/1.78-aarch64-unknown-linux-gnu/bin/cargo nextest run --manifest-path /home/runner/work/quickwit/quickwit/quickwit/Cargo.toml --target-dir /home/runner/work/quickwit/quickwit/quickwit/target/llvm-cov-target --all-features --retries 2` (exit status: 100)
Error: Process completed with exit code 1.
guilload commented 4 weeks ago

This is a duplicate of #5126. I've also pushed a tentative fix in 1c2ec26. I'm waiting to see if the test remains flaky in CI env in the next few days before closing the other issue.