Open Giszmo opened 1 year ago
So if there is a backlog, it should probably show estimates on how long that backlog is getting. What are the consequences of hard killing the process now?
Ctrl - C
will send a SIGINT
. I'm not sure if nostr-rs-relay respects that.
You could try getting the process ID and sending a SIGTERM
instead.
Although.. The code does seem to handle SIGINT
as well: https://github.com/scsibug/nostr-rs-relay/blob/5f9fe1ce59a20c1818c3d9afbc60313ac9cb51cb/src/server.rs#L196-L199
It does something. The log above shows that the database connection was closed. It also shut down all websockets and I'm pretty sure I did close it this way before.
I see the ^C right where the SIGINT is logged out:
^CDec 26 23:45:42.765 INFO nostr_rs_relay::server: shutting down due to SIGINT (main)
Dec 26 23:45:42.765 INFO nostr_rs_relay::server: Shutting down webserver due to SIGINT
Dec 26 23:45:42.765 INFO nostr_rs_relay::server: Close connection down due to shutdown, client: f931c73e, ip: "172.17
.0.1", connected: 67565.52392292s
Dec 26 23:45:42.765 INFO nostr_rs_relay::server: Close connection down due to shutdown, client: ee4c940e, ip: "172.17
.0.1", connected: 2559.958913493s
Dec 26 23:45:42.765 INFO nostr_rs_relay::server: stopping client connection (cid: ee4c940e, ip: "172.17.0.1", sent: 0
events, recv: 0 events, connected: 2559.958949945s)
Dec 26 23:45:42.765 INFO nostr_rs_relay::server: stopping client connection (cid: f931c73e, ip: "172.17.0.1", sent: 2
1 events, recv: 355119 events, connected: 67565.523955464s)
Dec 26 23:45:42.766 INFO nostr_rs_relay::server: Close connection down due to shutdown, client: 41970558, ip: "172.17
.0.1", connected: 61581.323059555s
Dec 26 23:45:42.766 INFO nostr_rs_relay::server: stopping client connection (cid: 41970558, ip: "172.17.0.1", sent: 0
events, recv: 450 events, connected: 61581.323095729s)
Dec 26 23:45:42.766 INFO nostr_rs_relay::server: Close connection down due to shutdown, client: ce259bc1, ip: "172.17
.0.1", connected: 4705.007252513s
I wonder if some threads aren't respecting shutdown events properly..
I'm running nostr-rs-relay in docker and haven't had any issues force stopping the container.
So what's concerning me is that the Load went up after SIGINT.
SIGTERM doesn't impress the process. docker kill nostr-relay
did.
yeah, now the DB is locked :(
docker run -it --rm -p 7000:8080 --user=100:100 -v $(pwd)/data:/usr/src/app/db:Z -v $(pwd)/config.toml:/usr/src/app/config.toml:ro,Z --name nostr-relay nostr-rs-relay:latest
Dec 27 00:51:33.028 INFO nostr_rs_relay: Starting up from main
Dec 27 00:51:33.046 INFO nostr_rs_relay::server: listening on: 0.0.0.0:8080
Dec 27 00:51:33.049 INFO nostr_rs_relay::server: db writer created
Dec 27 00:51:33.050 INFO nostr_rs_relay::server: control message listener started
Dec 27 00:51:38.060 ERROR r2d2: database is locked
Dec 27 00:51:38.060 ERROR r2d2: database is locked
Dec 27 00:51:38.060 ERROR r2d2: database is locked
Dec 27 00:51:43.067 ERROR r2d2: database is locked
Dec 27 00:51:43.467 ERROR r2d2: database is locked
:( how to recover?
ls -lh data/
total 22G
-rw-r--r-- 1 systemd-timesync users 3.9G Dec 26 23:45 nostr.db
-rw-r--r-- 1 systemd-timesync users 23M Dec 27 00:52 nostr.db-shm
-rw-r--r-- 1 systemd-timesync users 18G Dec 26 23:45 nostr.db-wal
Recover: I locked my DB as well when I ran out of diskspace and had to dump the data from the locked DB into a new sqlite db file to get it to work again.
WAL file: You can run "PRAGMA wal_checkpoint" (or "PRAGMA wal_checkpoint(truncate)") to reduce the size of the wal file. Nostr-rs-relay should do the truncate command automatically once every 60 seconds I believe though.
So DB lock recovery was to sqlite3 data/nostr.db ".backup backup.db"; mv backup.db data/nostr.db
.
With version 0.7.15 the -wal file doesn't grow big anymore but I still get the weird behaviour that I can't stop the service without corrupting the DB. The DB doesn't end up locked always but it does happen.
This is strange. Can you provide details about OS version? I want to spin up a VM and try to duplicate.
I've Ctrl-c'd the relay on my OSX and Fedora machine hundreds of times and I've never seen a database lock error even once.
# uname -a
Linux lw10 5.10.0-9-amd64 #1 SMP Debian 5.10.70-1 (2021-09-30) x86_64 GNU/Linux
digitalocean
SIGTERM doesn't impress the process.
docker kill nostr-relay
did.
I believe this is a Docker-specific issue. Namely, PID 1 (which the process is, in the container namespace) does not respect SIGKILL or SIGTERM
Yes, this probably is Docker related. It depends on how you invoke a command within a Dockerfile. Seems like Umbrel added a workaround for this: https://github.com/getumbrel/docker-nostr-rs-relay/blob/master/Dockerfile#L27 (ENTRYPOINT
is handled differently than CMD
and using an array instead of a string command also has an impact: https://testdriven.io/tips/9c233ead-5f34-4b95-b416-84f80f3ef5e7/)
I'm running the relay in a tmux and noticed it was using up all my 8 vCPU and the nostr.db-wal was 18GB compared to 4GB of the nostr.db ... I wanted to see what happens if I stop the relay and hit CTRL-C. It closed all connections and supposedly the DB but the process never finished and all the vCPU are still busy running
./nostr-rs-relay --db /usr/src/app/db
.Now ~6~ ~11~ 30min later it even complains about some events. Does it run on a many minutes backlog? How long is the backlog??