Closed sbug-1bit closed 2 years ago
Same problem here, I'm running synapse in an LXC (and the LXC is behind a reverse proxy (nginx)) and I cant make it work :(
telnet 192.168.1.6 8448
gives me "Connection refused" how is that possible ?
from nginx logs :
connect() failed (111: Connection refused) while connecting to upstream, client:
Here is my nginx config: https://0bin.net/paste/miSdHH+lC4vk988p#uTjdp6Ecm0WrshgdQqBFHLOtf+JMXYlLtP6QMybIINA
This is not the same issue that I have.
You should not forward port 8448 in ngnix ! Port 8448 should be forwarded in your router directly to your LXC with synapse. (This port uses TLS). (This port has nothing to do with running a proxy).
You should forward port 443 for a virtualhost in ngnix, to port 8008 to your LXC with synapse. So that if someone goes to https://yourdomain.com they come to your webserver, and if they go to https://matrix.yourdomain.com they are proxyed to http://192.168.1.6:8008/
(You will then also need a DNS record for matrix.yourdomain.com pointing to same as yourdomain.com).
For more help I suggest joining #matrix:matrix.org
Hey, thanks for your reply. The port 8008 is well configured, but I'm struggling with 8448. I removed it from the reverse proxy, I'm now trying to perform port forwarding with iptables. I've a bridge interface, all my LXC are in the subnet 192.168.1.0/24, I did the following :
iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -o eno1 -j MASQUERADE
I tried a lot of iptables rules and to perform port forwarding I actually got :
# Matrix port redirection
iptables -t nat -A PREROUTING -p tcp --dport 8448 -j DNAT --to 192.168.1.6:8448
iptables -t nat -A PREROUTING -p udp --dport 8448 -j DNAT --to 192.168.1.6:8448
iptables -A FORWARD -d 192.168.1.6 -p tcp --dport 8448 -j ACCEPT
iptables -A FORWARD -d 192.168.1.6 -p udp --dport 8448 -j ACCEPT
iptables -A FORWARD --in-interface lxc-nat-bridge -j ACCEPT
I receive all messages but I cant send any. When I try to join a room, I got "Internal server error".... Please help :sob:
Sorry it was many years ago I used iptables so I think you will find better help somewhere else about that.
@spacebug0 No problem, thanks anyway
Running synapse with metrics enables and Prometheus I got these.
Not sure which graphs are interested for the devs but as one can see something works really hard when getting room history (sometimes).
I finally got a chance to look at this - @spacebug0, sorry for the delay and thanks for your patience. Looking at the logs (https://lithen.net/files/homeserver.log), it seems that there's a whole stack of calls to /messages for Matrix HQ which never return:
sierra:Desktop matthew$ cat homeserver.log.txt | grep 'request.*/messages'
2017-12-30 18:23:30,885 - synapse.access.http.8008 - 59 - INFO - GET-74- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:24:36,685 - synapse.access.http.8008 - 59 - INFO - GET-170- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:25:35,224 - synapse.access.http.8008 - 59 - INFO - GET-266- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:26:36,649 - synapse.access.http.8008 - 59 - INFO - GET-370- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:27:41,834 - synapse.access.http.8008 - 59 - INFO - GET-459- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:28:24,664 - synapse.access.http.8008 - 59 - INFO - GET-521- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
Looking at the individual stuck requests, they all seem to get stuck trying to resolve state after checking a given auth event, which is a power level change from July 2015 (https://riot.im/develop/#/room/#matrix:matrix.org/$1436805007295iQZlO:jki.re)
2017-12-30 18:30:27,304 - synapse.handlers.federation - 1800 - INFO - GET-74- Different auth: set([u'$1436805007295iQZlO:jki.re'])
2017-12-30 18:30:27,306 - synapse.state - 393 - INFO - GET-74- Resolving state for !cURbafjkfsMDVwdRDQ:matrix.org with 2 groups
2017-12-30 18:31:50,407 - synapse.handlers.federation - 1800 - INFO - GET-789- Different auth: set([u'$1436805007295iQZlO:jki.re'])
2017-12-30 18:31:50,412 - synapse.state - 393 - INFO - GET-789- Resolving state for !cURbafjkfsMDVwdRDQ:matrix.org with 2 groups
etc
The first one (GET-74) also does a huge amount of backfill attempts (which seem to succeed) before getting stuck on this 'bad' event.
I've asked @spacebug0 to leave the room, but it sounds like state resets cause them to spontaneously rejoin it - and then on calling /messages in the room the problem recurs.
However, after the stack of requests which fail, there are then a few which succeed (after restarting the server):
2017-12-30 18:33:02,372 - synapse.access.http.8008 - 59 - INFO - GET-83- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:33:02,376 - synapse.access.http.8008 - 59 - INFO - GET-84- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted>
2017-12-30 18:34:55,905 - synapse.access.http.8008 - 91 - INFO - GET-83- 192.168.1.6 - 8008 - {@spacebug:lithen.net} Processed request: 113531ms (15531ms, 824ms) (29653ms/46) 2384B 200 "GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted> HTTP/1.1" "Riot.im/0.7.03 (Linux; U; Android 7.1.2; ZTE A2017U Build/NJH47; Flavour FDroid; MatrixAndroidSDK 0.8.03)"
2017-12-30 18:34:55,910 - synapse.access.http.8008 - 91 - INFO - GET-84- 192.168.1.6 - 8008 - {@spacebug:lithen.net} Processed request: 113533ms (16972ms, 900ms) (41852ms/94) 2380B 200 "GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283757_3843403_525_96193_1612_19_139_10551_1&limit=30&access_token=<redacted> HTTP/1.1" "Riot.im/0.7.03 (Linux; U; Android 7.1.2; ZTE A2017U Build/NJH47; Flavour FDroid; MatrixAndroidSDK 0.8.03)"
Looking at the request where /messages does manage to return (after 113s), the only difference I can see is that it tries to backfill from jki.re rather than matrix.org. And then subsequent ones seem to be quick:
2017-12-30 19:25:54,671 - synapse.access.http.8008 - 59 - INFO - GET-5275- 192.168.1.6 - 8008 - Received request: GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283929_3853327_24_96380_1621_19_141_10568_1&limit=30&access_token=<redacted>
2017-12-30 19:25:55,754 - synapse.access.http.8008 - 91 - INFO - GET-5275- 192.168.1.6 - 8008 - {@spacebug:lithen.net} Processed request: 1082ms (235ms, 4ms) (764ms/6) 2636B 200 "GET /_matrix/client/r0/rooms/!cURbafjkfsMDVwdRDQ:matrix.org/messages?dir=b&from=s283929_3853327_24_96380_1621_19_141_10568_1&limit=30&access_token=<redacted> HTTP/1.1" "Riot.im/0.7.03 (Linux; U; Android 7.1.2; ZTE A2017U Build/NJH47; Flavour FDroid; MatrixAndroidSDK 0.8.03)"
So I'm wondering if the initial /messages req (GET-74) gets complete stuck, wedging all subsequent ones until the server is restarted, leaking RAM as they stack up. But it's unclear why it gets wedged, and why it succeeds later on when it backfills from jki.re.
A final observation:
2017-12-30 18:26:12,945 - synapse.handlers.federation - 770 - INFO - GET-74- Failed to backfill from matrix.org because 403: Room '!cURbafjkfsMDVwdRDQ:matrix.org' does not exist
...does not look good at all.
@erikjohnston, any idea what's going on here? is it possible that a backfill failure like this would cause this explosion?
I'm in a similar situation, I had given up to try joining big rooms, but even using synapse with 5 users but still "federated" (just to be able to chat with a couple of friends), my VPS (2GB of ram) is suffering too much and the process takes too much memory.
It can happen that for 2-3 days it does not crash, but then (I still ignore the cause) it start eating ram and the oom-killer start killing things.
Initially I was furious because other processes (like my postgres db) was killed randomly, but now I added the MemoryLimit=512M
to my /lib/systemd/system/matrix-synapse.service
(on Debian 9) and only synapse is killed, other process survive. With this setting my synapse server is killed once a day than is auto-restarted by systemd and on the client side the users are happy, but I hope to have this solved...
Duplicate of #2504
Description
Scrolling up in Riot to get history for one or more room(s) can sometimes cause the server to use huge amount of memory, 100% of CPU and finally become unresponsive.
Steps to reproduce
I expect to get room history from the server, but the whole server goes bananas and every client connected get disconnected.
Some stuff from my homeserver.log
Thats the last line before the server totally hanged -^
Version information
If not matrix.org:
Version: What version of Synapse is running? Synapse/0.25.1
Install method: package manager/git clone/pip
Installed using pip
Platform: Tell us about the environment in which your homeserver is operating