sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.91k stars 738 forks source link

Windows + Nssm + Lighthouse + AppRotateOnline 1 #3735

Open xanatos opened 1 year ago

xanatos commented 1 year ago

Nssm is a quite commonly used (on Windows) tool to make an app a service. The version I'm using (taken from https://nssm.cc/download) is the nssm 2.24-101-g897c7ad (that is the pre-release).

I've done tests on two copies of Windows 11 Pro ITA (various subreleases... I had this problem from the beginning, 6 months ago)

Everything works correctly, unless I try to enable the options Rotate files + Rotate while service is running (AppRotateFiles 1 and AppRotateOnline 1)

To test, in an administrative console of Windows:

nssm.exe install lhv C:\lh\lighthouse.exe
nssm.exe set lhv AppParameters "validator --network goerli --datadir C:\lhdata"
nssm.exe set lhv AppDirectory C:\lh
nssm.exe set lhv AppExit Default Restart
nssm.exe set lhv AppStdout c:\lhdata\validator.log
nssm.exe set lhv AppStderr c:\lhdata\validator.log
nssm.exe set lhv AppRotateFiles 1
nssm.exe set lhv AppRotateOnline 1
nssm.exe set lhv AppRotateBytes 1000000
nssm.exe set lhv DisplayName lhv
nssm.exe set lhv ObjectName LocalSystem
nssm.exe set lhv Start SERVICE_DELAYED_AUTO_START
nssm.exe set lhv Type SERVICE_WIN32_OWN_PROCESS

then

sc start lhv

sc stop lhv

sc query lhv

The state should be STOPPED, but with AppRotateOnline 1 the state is STOP_PENDING and the nssm "hangs".

Don't know if the bug is in nssm or in how Lighthouse handles the console, and sadly nssm is "abandonware".

xanatos commented 1 year ago

I've made some tests. The problem seems to be in the nssm . It hangs waiting for the pipes to close. I was able to make a very small program in rust that only logged something and that could reproduce the problem.

use slog::info;
use sloggers::Build;
use sloggers::terminal::{TerminalLoggerBuilder /* , Destination */};
//use sloggers::types::Severity;

fn main() {
    let /* mut */ builder = TerminalLoggerBuilder::new();

    let logger = builder.build().unwrap();
    info!(logger, "Starting!");
    let mut line = String::new();
    let _b1 = std::io::stdin().read_line(&mut line).unwrap();
    info!(logger, "Ending!");
}

Luckily there seems to be a workaround. sloggers logs only on Stderr, and lighthouse, when launched as beaconnode or as a validator, only uses Stderr or logs through sloggers, so no Stdout use. If you don't log Stdout but only log Stderr, then the problem doesn't appear (you can even log Stdout to a different file and everything will work, the problem only appears when you try to log both Stdout AND Stderr to the same file)