valeriansaliou / vigil

🚦 Microservices Status Page. Monitors a distributed infrastructure and sends alerts (Slack, SMS, etc.).
https://crates.io/crates/vigil-server
Mozilla Public License 2.0
1.72k stars 128 forks source link

Matrix notifier causes panic 'not currently running on a Tokio 0.2.x runtime' #79

Closed ttymothy closed 3 years ago

ttymothy commented 3 years ago

I've been testing Vigil (version 1.21.0), and I found it really good. However, when I enable the Matrix notifier, the process crashes almost instantly, with the error:

thread 'vigil-aggregator' panicked at 'not currently running on a Tokio 0.2.x runtime.', /home/tty/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.25/src/runtime/handle.rs:118:28

The configuration for the Matrix server may be wrong, so it may be a problem with the authentication, but I'd expect a nicer error message in that case. The only value that I have doubts about is the device_id, the rest are all correct. Can anyone help me troubleshoot this error? Searching online, the closest I could find was paritytech/polkadot#685, but I'm not sure if it's applicable here.

Repro steps

The program must be compiled from source, as the prebuilt binaries were compiled without the notifier-matrix feature.

  1. Download version 1.21.0, for example with wget https://github.com/valeriansaliou/vigil/archive/v1.21.0.tar.gz
  2. Extract the contents of the tarball.
  3. Build the binary with cargo build --release --locked --all-features --target-dir=target
  4. Run target/release/vigil -c issue.cfg.
Log and stack trace
(INFO) - starting up
(DEBUG) - prober store: got service main-services
(DEBUG) - prober store: got node main-services:example
(DEBUG) - prober store: got replica main-services:example:https://example.com
(INFO) - initialized prober store
(DEBUG) - spawn managed thread: prober-poll
(DEBUG) - spawn managed thread: responder
(DEBUG) - spawn managed thread: prober-script
(DEBUG) - spawn managed thread: aggregator
(DEBUG) - running a poll probe operation...
(DEBUG) - will probe replica: HTTPS("https://example.com/") with retry count: 1
(DEBUG) - running a script probe operation...
(INFO) - ran script probe operation
(DEBUG) - sending aggregate startup notification...
(DEBUG) - did not dispatch notification to provider: email
(DEBUG) - did not dispatch notification to provider: twilio
(DEBUG) - did not dispatch notification to provider: slack
(DEBUG) - did not dispatch notification to provider: telegram
(DEBUG) - did not dispatch notification to provider: pushover
(DEBUG) - did not dispatch notification to provider: gotify
(DEBUG) - did not dispatch notification to provider: xmpp
(INFO) - dispatch matrix notification for status: Healthy and replicas: []
(DEBUG) - dispatch matrix notification attempt: #1
(DEBUG) - glob converted to regex: Glob { glob: "*", re: "(?-u)^[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true }, tokens: Tokens([ZeroOrMore]) }
(DEBUG) - built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 0 required extensions, 1 regexes
(INFO) - Starting 8 workers
(INFO) - Starting "actix-web-service-[::1]:8080" service on [::1]:8080
(INFO) - login; self=Client { homeserver: https://matrix.example.com/ } user="statusbot" device_id=Some("WIAEZIABAK") initial_device_display_name=None
(INFO) - Logging in to https://matrix.example.com/ as "statusbot" 
(DEBUG) - starting new connection: https://matrix.example.com/
thread 'vigil-aggregator' panicked at 'not currently running on a Tokio 0.2.x runtime.', /home/tty/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.25/src/runtime/handle.rs:118:28
stack backtrace:
   0:     0x555830a20193 - std::backtrace_rs::backtrace::libunwind::trace::h25e12e0d899beba0
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
   1:     0x555830a20193 - std::backtrace_rs::backtrace::trace_unsynchronized::h70e61195d6ae3df6
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x555830a20193 - std::sys_common::backtrace::_print_fmt::hba93ab80d779695a
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x555830a20193 - ::fmt::hf092b5883b4b2e50
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x555830738a3c - core::fmt::write::hf68bc350a8f2f0dc
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/fmt/mod.rs:1078:17
   5:     0x555830a1fae1 - std::io::Write::write_fmt::hf66811b1bc767436
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/io/mod.rs:1517:15
   6:     0x555830a1f530 - std::sys_common::backtrace::_print::hd425a11bfe1f20f8
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x555830a1f530 - std::sys_common::backtrace::print::h6d678795c1e61e13
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x555830a1f530 - std::panicking::default_hook::{{closure}}::h78a02a4a0dee5e7e
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:208:50
   9:     0x555830a1ee5d - std::panicking::default_hook::h56eb7eda02f355a7
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:225:9
  10:     0x555830a1ee5d - std::panicking::rust_panic_with_hook::hb27ea14285131c61
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:591:17
  11:     0x555830a3c163 - std::panicking::begin_panic_handler::{{closure}}::hc552fcee62aad17f
  12:     0x555830a3c0dc - std::sys_common::backtrace::__rust_end_short_backtrace::hb9f0aa9a78e885a0
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/sys_common/backtrace.rs:141:18
  13:     0x555830a3c08d - rust_begin_unwind
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/panicking.rs:493:5
  14:     0x5558307361a0 - core::panicking::panic_fmt::h12ac4570ea43d06f
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/panicking.rs:92:14
  15:     0x555830739ce2 - core::option::expect_failed::h7e0f81ae38d4dc42
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/core/src/option.rs:1260:5
  16:     0x555830adc156 - tokio::runtime::handle::Handle::current::hddf996acc450db3d
  17:     0x55583091bc64 -  as core::future::future::Future>::poll::h5a3e1319f1539acc
  18:     0x55583092141c -  as core::future::future::Future>::poll::hfdf269e303acb29f
  19:     0x55583091ac30 -  as core::future::future::Future>::poll::h58750fcc2ce0efcb
  20:     0x555830921338 -  as core::future::future::Future>::poll::he139991093c48a4d
  21:     0x55583096a4d9 -  as core::future::future::Future>::poll::h45f517409fe43afb
  22:     0x55583090cf14 -  as core::future::future::Future>::poll::h255660b9a81fcec3
  23:     0x55583093d736 - ::Ok> as core::future::future::Future>::poll::h125d860f64527120
  24:     0x55583092c8a3 -  as core::future::future::Future>::poll::h3f9837140ab2312d
  25:     0x5558309621cb - ::poll::h9c7254f937ead879
  26:     0x5558307be9ca -  as core::future::future::Future>::poll::h256b3259a564348f
  27:     0x5558306a05b8 -  as core::future::future::Future>::poll::hc1d9f00c6807b8d5
  28:     0x555830b80ec6 - vigil::aggregator::manager::notify::h90bdb9657dd1b5e3
  29:     0x555830b77a8e - vigil::aggregator::manager::run::he5caecf527803e10
  30:     0x555830b42561 - std::sys_common::backtrace::__rust_begin_short_backtrace::hba1b1c36e2953b49
  31:     0x555830b6d30d - core::ops::function::FnOnce::call_once{{vtable.shim}}::h44a7b9a16640e7d3
  32:     0x555830a4b995 -  as core::ops::function::FnOnce>::call_once::h9ed215ba67984d70
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/alloc/src/boxed.rs:1328:9
  33:     0x555830a4b995 -  as core::ops::function::FnOnce>::call_once::hcece06e1fe04906f
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/alloc/src/boxed.rs:1328:9
  34:     0x555830a4b995 - std::sys::unix::thread::Thread::new::thread_start::h6e82a4b7be15319a
                               at /rustc/cb75ad5db02783e8b0222fee363c5f63f7e2cf5b/library/std/src/sys/unix/thread.rs:71:17
  35:     0x7f3732661299 - start_thread
  36:     0x7f373243c053 - clone
  37:                0x0 - 
Configuration file
# Vigil
# Microservices Status Page
# Configuration file
# Example: https://github.com/valeriansaliou/vigil/blob/master/config.cfg

[server]

log_level = "debug"
inet = "[::1]:8080"
workers = 4
reporter_token = "REPLACE_THIS_WITH_A_SECRET_KEY"

[assets]

path = "./res/assets/"

[branding]

page_title = "Crisp Status"
page_url = "https://status.crisp.chat/"
company_name = "Crisp IM SARL"
icon_color = "#1972F5"
icon_url = "https://valeriansaliou.github.io/vigil/images/crisp-icon.png"
logo_color = "#1972F5"
logo_url = "https://valeriansaliou.github.io/vigil/images/crisp-logo.svg"
website_url = "https://crisp.chat/"
support_url = "mailto:support@crisp.chat"
custom_html = ""

[metrics]

poll_interval = 120
poll_retry = 2

poll_http_status_healthy_above = 200
poll_http_status_healthy_below = 400

poll_delay_dead = 30
poll_delay_sick = 10

push_delay_dead = 20

push_system_cpu_sick_above = 0.90
push_system_ram_sick_above = 0.90

script_interval = 300

local_delay_dead = 40

[plugins]

[plugins.rabbitmq]

api_url = "http://127.0.0.1:15672"
auth_username = "rabbitmq-administrator"
auth_password = "RABBITMQ_ADMIN_PASSWORD"
virtualhost = "crisp"

queue_ready_healthy_below = 500
queue_nack_healthy_below = 100
queue_ready_dead_above = 20000
queue_nack_dead_above = 5000
queue_loaded_retry_delay = 500

[notify]

startup_notification = true
reminder_interval = 300

[notify.matrix]

homeserver_url = "https://matrix.example.com"
username = "statusbot"
password = "PASSWD"

room_id = "!ROOMID:example.com"
device_id = "DEVICEID"

[probe]

[[probe.service]]

id = "main-services"
label = "Main services"

[[probe.service.node]]

id = "example"
label = "example"
mode = "poll"
replicas = ["https://example.com"]
valeriansaliou commented 3 years ago

Hey @ttymothy

Thanks for this report. The Matrix notifiers is a recent contribution on Vigil that I didn't build myself, nor did I test it to be honest, as I didn't have any Matrix account at the time.

I've created one as to fix it, and decided to rework the notifier so that it is no more an optional feature, as it is now basically a zero-dependency notifier (except from the shared HTTP client).

Works like a charm w/ the fixes.

Issuing a Vigil release right now, a binary will be available within 30 minutes.

Commit ref: f0df4e05d23772e5e96e38b9cbadae31a31433ee

valeriansaliou commented 3 years ago

Update: a x64 binary is now available on https://github.com/valeriansaliou/vigil/releases/tag/v1.21.1

ttymothy commented 3 years ago

Thanks for the quick fix and release! The only drawback of not using the library is that messages are not e2e encrypted, but that's not a problem for me.