Open chgayot opened 1 year ago
Thankyou very much!! Great work!!
Had the very same issue and traced it to the same cause. I believe this information should be included in UPGRADING.md as a lot of people will encounter this issue.
You are my hero! Had the same problem. Looked around for a fix and finally thought I might not be the only idiot to upgrade right after release - let's look at the GitHub issues. Thank you 👍
As noted in #24601, the right fix is to set X-Forwarded-Proto
in requests forwarded from the proxy so that Zulip knows the request came in over a secure connection. Can you check if your proxy is configured to send that?
We can try to highlight this new requirement better in the upgrading documentation, but I want to verify that doing that fixes it for you. Setting CSRF_TRUSTED_ORIGINS
papers over the error by leaving Zulip thinking that requests are still coming in over an insecure connection, which could cause other problems.
X-Forwarded-Proto is correctly set and has been working since i've setup my Zulip instance (version 5). It's the update to version 7 that required using CSRF_TRUSTED_ORIGINS
to resolve the "CSRF verification failed. Request aborted." problem.
Ah -- the other half that changed in 7.0 is that we only trust the X-Forwarded-Proto if it is set by a trusted proxy. Do you have LOADBALANCER_IPS
set in your docker container's environment to the IP address that Zulip sees requests as coming from?
You can check if that 'correctly set by looking at the nginx access logs, and seeing if it's the IP of the traefik gateway, or the end-user.
See this comment on chat.zulip.org where someone else was configuring traefik requests to come from a static IP so Zulip can know to trust them.
I personally don't use traefik, i have my own solution based on nginx-proxy, maybe @chgayot can shed some light on that part. In any case, if there's a need to define LOADBALANCER_IPS
for 7.0 to work properly it should be reflected in the documentation.
Following the link and suggestion work, but I wasn't able to get it working on domain names rather than IPs, and I had to change the Traefik configuration which makes it a workaround rather than a production solution. It puts the entry barrier of docker-zulip really out of reach for beginners like me :(
For someone who knows more than me, there must be an init line or something similar we can run to get traefik's IP on container launch and set it up.
To sum up what I had to do:
In the TRAEFIK docker-compose:
networks:
traefik:
driver: bridge
name: traefik
ipam:
config:
- subnet: "172.18.0.0/24"** (maybe the right subnet is 16, never learnt this part)
AND
services:
traefik:
...(traefik config)
networks:
- traefik:
ipv4_address: 172.18.0.2
Finally, "just" set, in ZULIP docker-compose
LOADBALANCER_IPS: "172.18.0.2"
No need for SETTING_CSRF_TRUSTED_ORIGINS.
But I'm all ear for the right solution!
(continuing the discussion from #404, which is now closed)
The issue is not that X-Forwarded-Proto
is not set, or not trusted. The issue is, that it is reset/ignored.
When using reverse proxy, the request goes trough two proxies:
1) the outer, user configured one
2) inside the docker container, there is another nginx, that finally forwards to zulip/django app.
In other words, there are three connections:
User agent <- 1 -> outer proxy <- 2 -> inner nginx <- 3 -> zulip.
When we configure the outer proxy, the X-Forwarded-Proto
is being set to scheme used for connection 1.
However, the inner nginx sets it's own X-Forwarded-Proto
based on the protocol of connection 2, ignoring the header paassed by downstream proxy.
That means, if the outer proxy talks to inner nginx via http and not https (which is sensible if both are on the same machine), the inner nginx will tell to the zulip app that X-Forwarded-Proto
is http. Zulip can be accomodated for this with DISABLE_HTTPS=True
; but as both this issue and #404 show, it will bubble up somewhere anyway.
As a workaround, configure the connection 2 to use https and remove DISABLE_HTTPS=True
variable. This will make the inner nginx happy, though it might not be really secure; the connection 1 could still be http if misconfigured. It is also kind of a waste, talking on localhost with tls (and self-signed cert).
The loadbalancer.ips
is not a fix; according to the documentation, it configures Zulip to trust X-Forwarded-For
only if the request is from one of loadbalancer.ips
; it has no provision for trusting the X-Forwarded-Proto
.
P.S.: The X-Forwarded-Protocol
mentioned in #404 comes from here: https://github.com/zulip/docker-zulip/wiki/Proxying-via-nginx-on-host-machine
@chgayot wrote:
Following the link and suggestion work, but I wasn't able to get it working on domain names rather than IPs, and I had to change the Traefik configuration which makes it a workaround rather than a production solution. It puts the entry barrier of docker-zulip really out of reach for beginners like me :(
I'm not clear what you mean by "domain names rather than IPs" -- can you clarify? But changing the Traefik configuration seems necessary here, not a "workaround" -- nginx needs to know where the proxy is for it to be able to unroll X-Forwarded-For
headers so that Zulip can log correct IP addresses. And that you managed to configure Traefik shows that it's out of reach for beginners like you. :)
For someone who knows more than me, there must be an init line or something similar we can run to get traefik's IP on container launch and set it up.
I'm not familiar with Traefik, but my belief is that if you don't set the IP explicitly, it may actually move around as Docker reallocates it if the container restarts. So Zulip getting the traefik IP address at boot time isn't necessarily sufficient, since it might later become stale.
One option, which might remove the need to set up a specific subnet and IP for traefik, is to tell Zulip that the loadbalancer IPs are 172.16.0.0/20
, which is every IP address in that private subnet. Which means that any Docker container which can talk to Zulip is trusted, in that whatever it says for X-Forwarded-For
will be believed, but that may be a reasonable assumption you can make because you control the network topology.
@tomkv wrote:
(continuing the discussion from #404, which is now closed)
When using reverse proxy, the request goes trough two proxies:
- the outer, user configured one
- inside the docker container, there is another nginx, that finally forwards to zulip/django app.
In other words, there are three connections:
User agent <- 1 -> outer proxy <- 2 -> inner nginx <- 3 -> zulip.
When we configure the outer proxy, the
X-Forwarded-Proto
is being set to scheme used for connection 1.
Agree with all of the above. Specifically, the request from the outer proxy to the inner nginx, at <- 2 ->
, has X-Forwarded-Proto: https
and X-Forwarded-For: ip-of-user-agent
. An in your configuration, <- 1 ->
is over HTTPS, and <- 2 ->
is over HTTP (<- 3 ->
is over usgi
binary protocol on a UNIX domain socket).
However, the inner nginx sets it's own
X-Forwarded-Proto
based on the protocol of connection 2, ignoring the header passed by downstream proxy.
Yup, that's absolutely what a naïve implementation would do (and what Zulip used to do!) -- but we explicitly set the X-Forwarded-Proto
header in connection <- 3 ->
, to the Django process, to the incoming value of X-Forwarded-Proto
if it's coming from a known trusted IP. See zulip/zulip@0935d388f05337e73898d932ddb75b458cf4747e, and this code:
https://github.com/zulip/zulip/blob/0935d388f05337e73898d932ddb75b458cf4747e/puppet/zulip/files/nginx/zulip-include-common/proxy#L6
https://github.com/zulip/zulip/blob/0935d388f05337e73898d932ddb75b458cf4747e/puppet/zulip/templates/nginx/trusted-proto.template.erb
This is specifically a change in 7.0, and probably one of the causes of the problems folks are seeing.
Have you tried setting LOADBALANCER_IPS
to the outer proxy's IP? If that didn't work, then there may be something else afoot here, but as far as I can tell, that's the most correct solution still.
I'll work up a documentation and/or changelog change to highlight the need to have the loadbalancer IPs set in 7.0, since this issue is very much showing that we need to improve that. :)
The
loadbalancer.ips
is not a fix; according to the documentation, it configures Zulip to trustX-Forwarded-For
only if the request is from one ofloadbalancer.ips
; it has no provision for trusting theX-Forwarded-Proto
.
That's absolutely an error in the documentation, and I'll push a fix. Apologies for any confusion that caused!
P.S.: The
X-Forwarded-Protocol
mentioned in #404 comes from here: https://github.com/zulip/docker-zulip/wiki/Proxying-via-nginx-on-host-machine
Thanks -- I've updated to point to the canonical documentation.
@alexmv wrote:
Have you tried setting
LOADBALANCER_IPS
to the outer proxy's IP? If that didn't work, then there may be something else afoot here, but as far as I can tell, that's the most correct solution still.
After some investigation, it turnet out, that exactly this has been the issue:
1) LOADBALANCER_IPS
must to be used as an env var in docker-compose.yml
; even if using MANUAL_CONFIGURATION: "True"
and LINK_SETTINGS_TO_DATA: "True"
. Any manual configuration of loadbalancer.ips
in zulip.conf
will be ignored.
In fact, the README.md says:
If you do that, you can provide a
settings.py
file and azulip-secrets.conf
file in/opt/docker/zulip/zulip/settings/etc-zulip/
, and the container will use those.
It does not mention zulip.conf
as one of mapped files.
2) The IP in LOADBALANCER_IPS
must contain the IPs of docker network. 127.0.0.1
is not enough, not even if your proxy is running on the host (not in another container) and you configured proxy_pass http://127.0.0.1:exposed-port;
.
This is a bit of an issue with docker-compose; by default, it creates a new network project_default on every docker-compose up
and deletes it on every docker-compose down
. On every creation, it gets a new subnet from the pool. To prevent that, configuring specific subnet in docker-compose.yml
is necessary:
networks:
default:
ipam:
driver: default
config:
- subnet: ${ZULIP_SUBNET}
...
services:
zulip:
...
environment:
LOADBALANCER_IPS: "127.0.0.1, ${ZULIP_SUBNET}"
...
(and put ZULIP_SUBNET=...
into .env
-file).
After this, it works.
LOADBALANCER_IPS
must to be used as an env var indocker-compose.yml
; even if usingMANUAL_CONFIGURATION: "True"
andLINK_SETTINGS_TO_DATA: "True"
. Any manual configuration ofloadbalancer.ips
inzulip.conf
will be ignored. In fact, the README.md says:If you do that, you can provide a
settings.py
file and azulip-secrets.conf
file in/opt/docker/zulip/zulip/settings/etc-zulip/
, and the container will use those.It does not mention
zulip.conf
as one of mapped files.
Mmm, I can see this being unclear. /etc/zulip/settings.py
is only for things inside Django and python in general -- deployment settings which control how other binaries (e.g. nginx) run are in /etc/zulip/zulip.conf
. Docker doesn't have much call to change zulip.conf
, hence why the couple things which might reuire changes there (DISABLE_HTTPS
and LOADBALANCER_IPS
, primarily) have their own top-level env settings and MANUAL_CONFIGURATION
does not apply.
- The IP in
LOADBALANCER_IPS
must contain the IPs of docker network.127.0.0.1
is not enough, not even if your proxy is running on the host (not in another container) and you configuredproxy_pass http://127.0.0.1:exposed-port;
.
Yup -- 127.0.0.1
doesn't make sense as a proxy IP unless you literally installed another nginx inside the Docker container itself. As you note, Docker makes a private, imaginary, network on 172.16.0.0/20, and does NAT to proxy things in and out of it. Roughly:
1.2.3.4:443
127.0.0.1:8080
-- so it forwards the packet of the the lo
loopback adapter.172.18.0.2
; it rewrites the "to" address from 127.0.0.1
to 172.18.0.3
(or whatever internal IP Docker has assigned the Zulip container), and the destination port from 8080 to 80.172.18.0.2
-- to the Docker container, those packets are not from localhost, or the lo
loopback adapter, at all!172.18.0.2
on its imaginary network adapter on its virtual network, Docker does the reverse translation to look up where the request came from originally, and responds back over the lo
loopback adapter to nginx, who sees the packets as just being "from" 127.0.0.1
.1.2.3.4
.I'm glad you got things working. Can you take a look at #405 and zulip/zulip#26011 and see if the changes in those would have helped make this more evident?
Just upgraded to Zulip 7.0 and I got "CSRF verification failed. Request aborted." from Django when logging in (or infinite reloads when already logged in) As it is behind a reverse proxy, DISABLE_HTTPS: is set to True.
As per issue https://github.com/zulip/zulip/issues/24599 "CSRF_TRUSTED_ORIGINS no longer filled by EXTERNAL_HOST", an easy fix to this is to add: SETTING_CSRF_TRUSTED_ORIGINS: "['https://chat.domain']" in the docker-compose/zulip/environment
I don't know if it is the best way to fix the issue, and/or it's worth to document it (or document fully how to set it up behind a reverse proxy) or add it in the docker-compose, but for anyone facing the issue, that's a fix!