risingwavelabs / risingwave

Best-in-class stream processing, analytics, and management. Perform continuous analytics, or build event-driven applications, real-time ETL pipelines, and feature stores in minutes. Unified streaming and batch. PostgreSQL compatible.
https://go.risingwave.com/slack
Apache License 2.0
6.78k stars 561 forks source link

create table failed after network chaos #17809

Open huangjw806 opened 1 month ago

huangjw806 commented 1 month ago
================================================================================
chaos-mesh Result
================================================================================
Result               FAIL                
Pipeline Message     Nightly ch-benchmark-pg-cdc
Namespace            longcmkf-20240724-190253
TestBed              medium-arm-all-affinity
RW Version           nightly-20240724    
Test Start time      2024-07-24 19:09:00 
Test End time        2024-07-24 22:35:29 
Test Queries         q1,q2,q4,q5,q6,q7,q8,q9,q10,q11,q12,q13,q14,q15,q17,q18,q19,q20,q21,q22
Grafana Metric       https://grafana.test.risingwave-cloud.xyz/d/EpkBw5W4k/risingwave-dev-dashboard?orgId=1&var-datasource=Prometheus:%20test-useast1-eks-a&var-namespace=longcmkf-20240724-190253&from=1721848140000&to=1721860529000
Grafana Logs         https://grafana.test.risingwave-cloud.xyz/d/liz0yRCZz1/log-search-dashboard?orgId=1&var-data_source=Logging:%20test-useast1-eks-a&var-namespace=longcmkf-20240724-190253&from=1721848140000&to=1721860529000
Memory Dumps         https://s3.console.aws.amazon.com/s3/buckets/test-useast1-mgmt-bucket-archiver?region=us-east-1&bucketType=general&prefix=k8s/longcmkf-20240724-190253/&showversions=false
Buildkite Job        https://buildkite.com/risingwave-test/chaos-mesh/builds/957

sql failed message:

Running command create table t1 (x int, y int);
--
  | Failed to run the query
  |  
  | Caused by these errors (recent errors listed first):
  | 1: gRPC request to meta service failed: Internal error
  | 2: get error from control stream, in worker node 3
  | 3: gRPC request to stream service failed: Internal error
  | 4: failed to complete epoch
  | 5: Storage error
  | 6: Hummock error
  | 7: Other error: failed to sync: ObjectStore failed with IO error: Timeout error: Retry attempts exhausted for streaming_upload. Please modify streaming_upload_attempt_timeout_ms (current=5000) and streaming_upload_retry_attempts (current=3) under [storage.object_store.retry] in the config accordingly if needed.
  |  
  | Backtrace:
  | 0: std::backtrace_rs::backtrace::libunwind::trace
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
  | 1: std::backtrace_rs::backtrace::trace_unsynchronized
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
  | 2: std::backtrace::Backtrace::create
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/backtrace.rs:331:13
  | 3: <thiserror_ext::backtrace::MaybeBacktrace as thiserror_ext::backtrace::WithBacktrace>::capture
  | at ./root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/thiserror-ext-0.1.2/src/backtrace.rs:30:18
  | 4: thiserror_ext::ptr::ErrorBox<T,B>::new
  | at ./root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/thiserror-ext-0.1.2/src/ptr.rs:40:33
  | 5: <risingwave_object_store::object::error::ObjectError as core::convert::From<E>>::from
  | at ./risingwave/src/object_store/src/object/error.rs:26:45
  | 6: <T as core::convert::Into<U>>::into
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/convert/mod.rs:759:9
  | 7: risingwave_object_store::object::error::ObjectError::timeout
  | at ./risingwave/src/object_store/src/object/error.rs:26:65
  | 8: risingwave_object_store::object::retry_request::{{closure}}::{{closure}}::{{closure}}::{{closure}}
  | at ./risingwave/src/object_store/src/object/mod.rs:1163:25
  | 9: core::result::Result<T,E>::unwrap_or_else
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/result.rs:1431:23
  | 10: risingwave_object_store::object::retry_request::{{closure}}::{{closure}}::{{closure}}
  | at ./risingwave/src/object_store/src/object/mod.rs:1160:13
  | 11: tokio_retry::future::RetryState<A>::poll
  | at ./root/.cargo/git/checkouts/rust-tokio-retry-38aa0853644639a4-shallow/95e2fd3/src/future.rs:27:73
  | 12: <tokio_retry::future::RetryIf<I,A,C> as core::future::future::Future>::poll
  | at ./root/.cargo/git/checkouts/rust-tokio-retry-38aa0853644639a4-shallow/95e2fd3/src/future.rs:156:15
  | 13: risingwave_object_store::object::retry_request::{{closure}}
  | at ./risingwave/src/object_store/src/object/mod.rs:1171:62
  | 14: risingwave_object_store::object::s3::S3StreamingUploader::upload_next_part::{{closure}}::{{closure}}
  | at ./risingwave/src/object_store/src/object/s3.rs:227:88
  | 15: <tracing::instrument::Instrumented<T> as core::future::future::Future>::poll
  | at ./root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tracing-0.1.40/src/instrument.rs:321:9
  | 16: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}
  | at ./root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/core.rs:328:17
  | 17: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
  | at ./root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/loom/std/unsafe_cell.rs:16:9
  | 18: tokio::runtime::task::core::Core<T,S>::poll
  | at ./root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/core.rs:317:30
  | 19: tokio::runtime::task::harness::poll_future::{{closure}}
  | at ./root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.37.0/src/runtime/task/harness.rs:485:19
  | 20: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/core/src/panic/unwind_safe.rs:272:9
  | 21: std::panicking::try::do_call
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panicking.rs:552:40
  | 22: std::panicking::try
  | at ./rustc/4a0cc881dcc4d800f10672747f61a94377ff6662/library/std/src/panicking.rs:516:19
fuyufjh commented 3 weeks ago

cc. @Li0k for awareness

fuyufjh commented 3 weeks ago

The problem happened after the network chaos