Closed hanserasmus closed 4 months ago
You could try starting it manually (./bin/signaling -config /path/to/your/server.conf
) and check for any output.
Also depending on your previous version, please check the signaling.service
file, there have been changes in the past (e.g. https://github.com/strukturag/nextcloud-spreed-signaling/pull/322) that fixed some startup problems.
Thanks again for the quick reply @fancycode.
root@signaling:/opt/nextcloud-spreed-signaling# ./bin/signaling --config /etc/signaling/server.conf
main.go:133: Starting up version 1.2.2/go1.21.5 as pid 1503
main.go:142: Using a maximum of 10 CPUs
natsclient.go:108: Connection established to nats://localhost:4222 (NBDVP4CI3SVEUCD3WR4WNMUQCQVAAPH625SK7AD434JYUGCQ7TWZ7EJU)
grpc_common.go:167: WARNING: No GRPC server certificate and/or key configured, running unencrypted
grpc_common.go:169: WARNING: No GRPC CA configured, expecting unencrypted connections
hub.go:190: WARNING: No shared secret has been set for internal clients.
backend_storage_static.go:72: Backend backend-1 added for https://client1/
backend_storage_static.go:72: Backend backend-2 added for https://client2/
backend_storage_static.go:72: Backend backend-3 added for https://client3/
backend_storage_static.go:72: Backend backend-4 added for https://client4/
backend_storage_static.go:72: Backend backend-5 added for https://client5/
hub.go:202: Using a maximum of 8 concurrent backend connections per host
hub.go:209: Using a timeout of 10s for backend connections
hub.go:262: Not using GeoIP database
mcu_janus.go:292: Connected to Janus WebRTC Server 1.1.2 by Meetecho s.r.l.
mcu_janus.go:298: Found JANUS VideoRoom plugin 0.0.9 by Meetecho s.r.l.
mcu_janus.go:303: Data channels are supported
mcu_janus.go:307: Full-Trickle is enabled
mcu_janus.go:309: Maximum bandwidth 1048576 bits/sec per publishing stream
mcu_janus.go:310: Maximum bandwidth 2097152 bits/sec per screensharing stream
mcu_janus.go:316: Created Janus session 6387500080897936
mcu_janus.go:323: Created Janus handle 394968486920516
main.go:263: Using janus MCU
hub.go:387: Using a timeout of 10s for MCU requests
backend_server.go:98: Using configured TURN API key
backend_server.go:99: Using configured shared TURN secret
backend_server.go:101: Adding "turn:signaling.hbps.co.za:5349?transport=udp" as TURN server
backend_server.go:101: Adding "turn:signaling.hbps.co.za:5349?transport=tcp" as TURN server
backend_server.go:114: No IPs configured for the stats endpoint, only allowing access from 127.0.0.1
main.go:339: Listening on 127.0.0.1:8080
client.go:283: Client from CLIENT1_IP has RTT of 194 ms (194.062168ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:45182: use of closed network connection
client.go:283: Client from CLIENT1_IP has RTT of 19 ms (19.460148ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:53558: use of closed network connection
client.go:283: Client from CLIENT1_IP has RTT of 60 ms (60.369351ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:53572: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 7 ms (7.669592ms)
client.go:283: Client from CLIENT1_IP has RTT of 568 ms (568.05427ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:60018: use of closed network connection
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:60010: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 8 ms (8.494554ms)
client.go:283: Client from CLIENT1_IP has RTT of 127 ms (127.452376ms)
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:60020: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 7 ms (7.927286ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:60026: use of closed network connection
client.go:283: Client from CLIENT1_IP has RTT of 35 ms (35.520615ms)
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:41474: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 7 ms (7.991057ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:41490: use of closed network connection
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:41502: use of closed network connection
client.go:283: Client from CLIENT1_IP has RTT of 49 ms (49.926458ms)
client.go:283: Client from CLIENT2_IP has RTT of 8 ms (8.148702ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:41508: use of closed network connection
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:41524: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 7 ms (7.989346ms)
hub.go:884: Invalid message {Marshaler:<nil> Unmarshaler:<nil> Id: Type: Hello:<nil> Bye:<nil> Room:<nil> Message:<nil> Control:<nil> Internal:<nil> TransientData:<nil>} from CLIENT2_IP: type missing
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:36020: use of closed network connection
client.go:283: Client from CLIENT1_IP has RTT of 3 ms (3.772848ms)
client.go:283: Client from CLIENT2_IP has RTT of 7 ms (7.808306ms)
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:36028: use of closed network connection
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:36024: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 8 ms (8.436012ms)
hub.go:884: Invalid message {Marshaler:<nil> Unmarshaler:<nil> Id: Type: Hello:<nil> Bye:<nil> Room:<nil> Message:<nil> Control:<nil> Internal:<nil> TransientData:<nil>} from CLIENT2_IP: type missing
client.go:283: Client from CLIENT1_IP has RTT of 7 ms (7.80567ms)
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:36040: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 8 ms (8.061372ms)
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:36056: use of closed network connection
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:36058: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 8 ms (8.769925ms)
client.go:283: Client from CLIENT1_IP has RTT of 104 ms (104.228874ms)
client.go:302: Error reading from CLIENT2_IP: read tcp 127.0.0.1:8080->127.0.0.1:48866: use of closed network connection
client.go:302: Error reading from CLIENT1_IP: read tcp 127.0.0.1:8080->127.0.0.1:48868: use of closed network connection
client.go:283: Client from CLIENT2_IP has RTT of 7 ms (7.212955ms)
^Cmain.go:362: Interrupted
janus_client.go:404: conn.NextReader: read tcp 127.0.0.1:53648->127.0.0.1:8188: use of closed network connection
mcu_janus.go:269: Connection to Janus gateway was interrupted, reconnecting in 1s
This is what I get. I have rebooted the server in order to try and get rid of stale connections. But I still get the above errors. I have not made any config changes anywhere. And it is the first time my upgrade process has broken the way it has here. So I am at a bit of a loss.
hub.go:884: Invalid message {Marshaler:
Unmarshaler: Id: Type: Hello: Bye: Room: Message: Control: Internal: TransientData: } from CLIENT2_IP: type missing
This looks like #180, please try a make clean && make build
.
Also depending on your previous version, please check the
signaling.service
file, there have been changes in the past (e.g. #322) that fixed some startup problems.
My previous update was 2 months ago, and the old bin still starts up if I use that instead of the newly compiled one, which make me think there is something wrong with my new bin file or my setup as a whole.
My previous update was 2 months ago, and the old bin still starts up if I use that instead of the newly compiled one, which make me think there is something wrong with my new bin file or my setup as a whole.
Yeah, makes sense. Did you try the make clean && make build
to check if the Invalid message
log entry goes away?
Hi @fancycode. Sorry for the delayed response. My issue seems to persist. Today I did a git pull
, make clean && make build
and when I start it up with ./bin/signaling --config /etc/signaling/server.conf
it all starts up without any issues and there are no errors. However, when I copy this bin into the /usr/bin/
folder, the service throws the following error:
Dec 31 11:26:23 signaling systemd[1]: signaling.service: Main process exited, code=dumped, status=31/SYS
Dec 31 11:26:23 signaling systemd[1]: signaling.service: Failed with result 'core-dump'.
Dec 31 11:26:24 signaling systemd[1]: signaling.service: Scheduled restart job, restart counter is at 3.
Dec 31 11:26:24 signaling systemd[1]: Stopped Nextcloud Talk signaling server.
Dec 31 11:26:24 signaling systemd[1]: Started Nextcloud Talk signaling server.
Dec 31 11:26:24 signaling kernel: [1726194.342198] audit: type=1326 audit(1704014784.256:48): auid=4294967295 uid=997 gid=997 ses=4294967295 pid=48825 comm="signaling" exe="/usr/bin/signaling" sig=31 arch=c000003e syscall=160 compat=0 ip=0x40802e code=0x80000000
Dec 31 11:26:24 signaling systemd[1]: signaling.service: Main process exited, code=dumped, status=31/SYS
Dec 31 11:26:24 signaling systemd[1]: signaling.service: Failed with result 'core-dump'.
Dec 31 11:26:24 signaling systemd[1]: signaling.service: Scheduled restart job, restart counter is at 4.
Dec 31 11:26:24 signaling systemd[1]: Stopped Nextcloud Talk signaling server.
Dec 31 11:26:24 signaling systemd[1]: Started Nextcloud Talk signaling server.
Dec 31 11:26:24 signaling kernel: [1726194.842208] audit: type=1326 audit(1704014784.756:49): auid=4294967295 uid=997 gid=997 ses=4294967295 pid=48836 comm="signaling" exe="/usr/bin/signaling" sig=31 arch=c000003e syscall=160 compat=0 ip=0x40802e code=0x80000000
Dec 31 11:26:24 signaling systemd[1]: signaling.service: Main process exited, code=dumped, status=31/SYS
Dec 31 11:26:24 signaling systemd[1]: signaling.service: Failed with result 'core-dump'.
Dec 31 11:26:25 signaling systemd[1]: signaling.service: Scheduled restart job, restart counter is at 5.
Dec 31 11:26:25 signaling systemd[1]: Stopped Nextcloud Talk signaling server.
Dec 31 11:26:25 signaling systemd[1]: signaling.service: Start request repeated too quickly.
Dec 31 11:26:25 signaling systemd[1]: signaling.service: Failed with result 'core-dump'.
Dec 31 11:26:25 signaling systemd[1]: Failed to start Nextcloud Talk signaling server.
Dec 31 11:26:40 signaling kernel: [1726211.023187] [UFW BLOCK] IN=eth0 OUT= MAC=33:33:00:00:00:01:a6:b4:70:22:07:5e:86:dd SRC=fe80:0000:0000:0000:a4b4:70ff:fe22:075e DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=192 TC=0 HOPLIMIT=1 FLOWLBL=496516 PROTO=UDP SPT=5678 DPT=5678 LEN=152
Any ideas as to why this is still happening? The old bin file with the .service invocation works 100%, just the newly compiled one is the issue.
Do you have SELinux or something similar enabled? The "kernel: audit ..." log entries seem to indicate that something was blocked and the signaling server was killed because of that. Do you have other logs with more details?
These rules might only be active for applications run from /usr/bin
which is why it works if you run it from your build directory.
Nope. On Ubuntu 20.04. No SELinux enabled. And nothing fancy done to a minimal server install. No idea which logs to look at honestly, as I have never encountered a "blocking" in Ubuntu before.
Another guess: the syscall from the audit log is 160 (syscall=160
), which according to https://filippo.io/linux-syscall-table/ is setrlimit
. The service seems to run with uid/gid 997/997 which might not be allowed to execute this call while your user is and executing the service manually works for you.
Could you try running as root
(or your user) from the systemd service and check if this works?
Also could you please check /var/log/syslog
, /var/log/auth.log
and dmesg
for any messages when the process crashes?
This looks like a potential cause for the setrlimit
call during startup with newer builds:
https://github.com/golang/go/issues/46279
Could you please provide your .service
file? Maybe you need to apply the changes from https://github.com/strukturag/nextcloud-spreed-signaling/pull/322 (as mentioned above).
In https://github.com/restic/rest-server/pull/249 a similar problem was reported where changing the .service
file fixed the startup crash.
@hanserasmus did you get a chance to look at your .service
file yet?
I am sorry for the late reply @fancycode , but here is my service file's contents:
[Unit]
Description=Nextcloud Talk signaling server
After=janus.service
[Service]
ExecStart=/usr/bin/signaling --config /etc/signaling/server.conf
User=signaling
Group=signaling
Restart=on-failure
# Makes sure that /etc/signaling is owned by this service
ConfigurationDirectory=signaling
# Hardening - see systemd.exec(5)
CapabilityBoundingSet=
ExecPaths=/usr/bin/signaling /usr/lib
LockPersonality=yes
MemoryDenyWriteExecute=yes
NoExecPaths=/
NoNewPrivileges=yes
PrivateDevices=yes
PrivateTmp=yes
PrivateUsers=yes
ProcSubset=pid
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectProc=invisible
ProtectSystem=strict
RemoveIPC=yes
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~ @privileged @resources
[Install]
WantedBy=multi-user.target
Looking at the PR you mentioned, should I remove the ~@resources part in my service file?
I think this worked!!!
I was able to compile a new version, and with the content of my service file now like this:
[Unit]
Description=Nextcloud Talk signaling server
After=janus.service
[Service]
ExecStart=/usr/bin/signaling --config /etc/signaling/server.conf
User=signaling
Group=signaling
Restart=on-failure
# Makes sure that /etc/signaling is owned by this service
ConfigurationDirectory=signaling
# Hardening - see systemd.exec(5)
CapabilityBoundingSet=
ExecPaths=/usr/bin/signaling /usr/lib
LockPersonality=yes
MemoryDenyWriteExecute=yes
NoExecPaths=/
NoNewPrivileges=yes
PrivateDevices=yes
PrivateTmp=yes
PrivateUsers=yes
ProcSubset=pid
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectHostname=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectProc=invisible
ProtectSystem=strict
RemoveIPC=yes
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
RestrictNamespaces=yes
RestrictRealtime=yes
RestrictSUIDSGID=yes
SystemCallArchitectures=native
SystemCallFilter=@system-service
#SystemCallFilter=~ @privileged @resources
SystemCallFilter=~ @privileged
[Install]
WantedBy=multi-user.target
It seems the service stays up and no longer fails!
Thank you @fancycode and I am really sorry it took me so long to respond and fix this. It has been insanely busy on my end. Not an excuse I know. But thank you again! I hope this will help someone else as well!
Perfect, thanks for confirming. So this is fixed already in the repository then.
No worries about the response delay, I'm not quick in answering in some cases, too :-)
Hi all
After doing a
my signaling service does not want to start up. It fails with the following message:
My build did not throw out any errors:
Any ideas where I can search further for log entries as to why this happened and hopefully get a fix for it?
Thanks in advance for any help!