kaspanet / rusty-kaspa

Kaspa full-node and related libraries in the Rust programming language. This is a Beta version at the final testing phases.
ISC License
350 stars 105 forks source link

Write panic errors to logs (#445) #446

Closed gvbgduh closed 2 weeks ago

gvbgduh commented 1 month ago

Loggin from panic::set_hook.

It uses error macro from the same package module log

It works, but only after the logger is initialized by the kaspad daemon, before that it does nothing, so there is a gap.

>:~/.rusty-kaspa/kaspa-testnet-11/logs$ cat rusty-kaspa_err.log
2024-04-05 01:17:20.187+11:00 [ERROR] Panic at kaspad/src/daemon.rs:239: Test logger panic
>:~/.rusty-kaspa/kaspa-testnet-11/logs$ cat rusty-kaspa.log
2024-04-05 01:17:20.187+11:00 [INFO ] Logger is initiated with the log level: INFO and the log directory: "/home/george/.rusty-kaspa/kaspa-testnet-11/logs"
2024-04-05 01:17:20.187+11:00 [INFO ] kaspad v0.13.5
2024-04-05 01:17:20.187+11:00 [INFO ] Application directory: /home/george/.rusty-kaspa
2024-04-05 01:17:20.187+11:00 [INFO ] Data directory: /home/george/.rusty-kaspa/kaspa-testnet-11/datadir
2024-04-05 01:17:20.187+11:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
2024-04-05 01:17:20.187+11:00 [ERROR] Panic at kaspad/src/daemon.rs:239: Test logger panic
gvbgduh commented 1 month ago

I'm still thinking about unit tests

aspect commented 1 month ago

I am concerned that calling error!() before the logger is initialized could cause a panic by itself. If this is the case perhaps a separate function that sets a static atomic bool after the logger init indicating that it is safe to use error!() would make sense. If this is not the case I guess this comment can be ignored.

gvbgduh commented 1 month ago

I am concerned that calling error!() before the logger is initialized could cause a panic by itself. If this is the case perhaps a separate function that sets a static atomic bool after the logger init indicating that it is safe to use error!() would make sense. If this is not the case I guess this comment can be ignored.

It's quite the opposite, the default behavior is to silently swallow all log failures, I have corresponding links in another comment. But a static atomic bool might still be of the help.

biryukovmaxim commented 3 weeks ago

@gvbgduh hi! Thanks for your efforts, it's nice to see our gang of contributors getting bigger!

1) there are some locations where panic handler is settled before logger initialization. like here

impl Runtime {
    pub fn from_args(args: &Args) -> Self {
        // Configure the panic behavior
        kaspa_core::panic::configure_panic();

        let log_dir = get_log_dir(args);

        // Initialize the logger
        kaspa_core::log::init_logger(log_dir.as_deref(), &args.log_level);

        Self { log_dir: log_dir.map(|log_dir| log_dir.to_owned()) }
    }
}

could you check them?

2) log::logger() function already has a fallback we can rely on it. so if we set hook using the function. when we panic before initialization - it won't produce logs after - it will work.

3) Imo no need to use atomic bool, moreover the current approach blocks the thread(since it's sync function) waiting for logger initialization. it means the only way to set the logger - is from another thread, unlikely it will ever be done

4) to handle the case where we panic before the initialization. (if we really want to handle it now and here) I would suggest to use OnceBool or lazy_static unit () to make it impossible to set false in purposes to determine if logger is initialized. the state machine would look like:

   None before logger initialized 
   Some(()) after initialization
   No way to make it None again
Only one possible transition

and direct writing to file when it's not initialized

gvbgduh commented 3 weeks ago

thanks @biryukovmaxim, very interesting.

thb, i haven't exercised the thought about 3, but it makes perfect sense as i call sleep in the sync function.

i'd agree we don't really need atomic bool as i see it just recreates the fallback logic from the link in 2, it's just that the initialization state is private to the log crate. i think it should probably be possible to hack around some logic about the logger itself, all in all, it'll be NoOp logger otherwise, but, i think it's a bit too hacky.

if we really want to handle it now and here

yes, from the practical point of view, i'm really on the fence it's worth the trouble, as i'm not sure there is a lot of business value in covering the gaps for panic before the logging system is up as in most cases it would be within the interaction with CLI.

i'd personally propose to revert to the first revision: we do our best effort to log, but it'll be up to the logging system; and have a separate ticket for the panic report: on every panic write a dedicate report directly to a file. that file should be modelled and have additional context like host details, host health, etc. that would cover cases when logging wasn't up, but there was still unattended panic during the execution.

but, i'm happy to follow the OnceBool path as well.

gvbgduh commented 2 weeks ago

As waiting there is pointless, reverting to the initial commit, where logging is done relying on the logging fallback itself. Errors caused by panic before the logging system is ready would be ignored (follow up issue https://github.com/kaspanet/rusty-kaspa/issues/465).

gvbgduh commented 2 weeks ago

Testing Setup:

cargo run --release --bin kaspad -- --testnet --netsuffix=11

Manually introduced panic after login init:

>~/rusty-kaspa$ git diff
diff --git a/kaspad/src/daemon.rs b/kaspad/src/daemon.rs
index dd8a215..d60c3b0 100644
--- a/kaspad/src/daemon.rs
+++ b/kaspad/src/daemon.rs
@@ -240,6 +240,9 @@ pub fn create_core_with_runtime(runtime: &Runtime, args: &Args, fd_total_budget:
         }
     }

+    // Point of failure after the logs are initialized
+    panic!("This is a test panic with logging initiated");
+
     let consensus_db_dir = db_dir.join(CONSENSUS_DB);

Stderr:

2024-05-01 14:53:43.032+10:00 [INFO ] kaspad v0.13.6
2024-05-01 14:53:43.033+10:00 [INFO ] Application directory: /home/george/.rusty-kaspa
2024-05-01 14:53:43.033+10:00 [INFO ] Data directory: /home/george/.rusty-kaspa/kaspa-testnet-11/datadir
2024-05-01 14:53:43.033+10:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
thread 'main' panicked at kaspad/src/daemon.rs:244:5:
This is a test panic with logging initiated
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exiting...
2024-05-01 14:53:43.033+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244: This is a test panic with logging initiated

Logs:

george@Darksteel:~/rusty-kaspa$ ls /home/george/.rusty-kaspa/kaspa-testnet-11/logs
rusty-kaspa.log  rusty-kaspa_err.log
george@Darksteel:~/rusty-kaspa$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa.log
2024-05-01 14:57:01.610+10:00 [INFO ] kaspad v0.13.6
2024-05-01 14:57:01.610+10:00 [INFO ] Application directory: /home/george/.rusty-kaspa
2024-05-01 14:57:01.610+10:00 [INFO ] Data directory: /home/george/.rusty-kaspa/kaspa-testnet-11/datadir
2024-05-01 14:57:01.611+10:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
2024-05-01 14:57:01.611+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244: This is a test panic with logging initiated
$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa_err.log
2024-05-01 14:57:01.611+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244: This is a test panic with logging initiated
biryukovmaxim commented 2 weeks ago

@gvbgduh could you go through the code and reorder all places where panic handler is settled before logger initialization?

gvbgduh commented 2 weeks ago

@biryukovmaxim yep, can do @coderofstuff indeed, thanks, let me fix it

gvbgduh commented 2 weeks ago

Updated test:

$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa.log | tail -n 2
2024-05-01 15:25:36.190+10:00 [INFO ] Logs directory: /home/george/.rusty-kaspa/kaspa-testnet-11/logs
2024-05-01 15:25:36.190+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244:5: This is a test panic with logging initiated
$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa_err.log | tail -n 2
2024-05-01 15:05:35.678+10:00 [ERROR] Panic at kaspad/src/daemon.rs:202: This is a test panic without logging initiated
2024-05-01 15:25:36.190+10:00 [ERROR] Panic at kaspad/src/daemon.rs:244:5: This is a test panic with logging initiated
gvbgduh commented 2 weeks ago

@biryukovmaxim just for a thought exercise, is it worth considering cases when there is panic before kaspa_core::panic::configure_panic();?

biryukovmaxim commented 2 weeks ago

@biryukovmaxim just for a thought exercise, is it worth considering cases when there is panic before kaspa_core::panic::configure_panic();?

I think it's a rare case, that mostly can appear during development phase, where regular panic should be okay. What can we do with that? Setting another panic(call it prehook) handler before current? Should we considering panics before prehook again? =D

gvbgduh commented 2 weeks ago

@biryukovmaxim thanks, the last commit should cover all places where the panic hook is set

michaelsutton commented 2 weeks ago

The name of the panicking thread is important too

gvbgduh commented 2 weeks ago

good point, update the error message

    Finished release [optimized] target(s) in 1m 06s
     Running `target/release/kaspad --testnet --netsuffix=11`
thread 'main' panicked at kaspad/src/daemon.rs:225:5:
Test panic with logging
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Exiting...
2024-05-01 20:46:43.740+10:00 [ERROR] Panic at the thread main at kaspad/src/daemon.rs:225:5: Test panic with logging

~/rusty-kaspa$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa.log | tail -n 2
2024-05-01 20:15:01.498+10:00 [ERROR] Panic at kaspad/src/daemon.rs:225:5: Test panic with logging
2024-05-01 20:46:43.740+10:00 [ERROR] Panic at the thread main at kaspad/src/daemon.rs:225:5: Test panic with logging

~/rusty-kaspa$ cat /home/george/.rusty-kaspa/kaspa-testnet-11/logs/rusty-kaspa_err.log | tail -n 2
2024-05-01 20:15:01.498+10:00 [ERROR] Panic at kaspad/src/daemon.rs:225:5: Test panic with logging
2024-05-01 20:46:43.740+10:00 [ERROR] Panic at the thread main at kaspad/src/daemon.rs:225:5: Test panic with logging
michaelsutton commented 2 weeks ago

Merged. Congratulations @gvbgduh for the first contribution 💯