CSRF verification failed if behind a reverse proxy (traefik)

chgayot commented 1 year ago

Just upgraded to Zulip 7.0 and I got "CSRF verification failed. Request aborted." from Django when logging in (or infinite reloads when already logged in) As it is behind a reverse proxy, DISABLE_HTTPS: is set to True.

As per issue https://github.com/zulip/zulip/issues/24599 "CSRF_TRUSTED_ORIGINS no longer filled by EXTERNAL_HOST", an easy fix to this is to add: SETTING_CSRF_TRUSTED_ORIGINS: "['https://chat.domain']" in the docker-compose/zulip/environment

I don't know if it is the best way to fix the issue, and/or it's worth to document it (or document fully how to set it up behind a reverse proxy) or add it in the docker-compose, but for anyone facing the issue, that's a fix!

mp-strachan commented 1 year ago

Thankyou very much!! Great work!!

fatk commented 1 year ago

Had the very same issue and traced it to the same cause. I believe this information should be included in UPGRADING.md as a lot of people will encounter this issue.

jwillmer commented 1 year ago

You are my hero! Had the same problem. Looked around for a fix and finally thought I might not be the only idiot to upgrade right after release - let's look at the GitHub issues. Thank you 👍

alexmv commented 1 year ago

As noted in #24601, the right fix is to set X-Forwarded-Proto in requests forwarded from the proxy so that Zulip knows the request came in over a secure connection. Can you check if your proxy is configured to send that?

We can try to highlight this new requirement better in the upgrading documentation, but I want to verify that doing that fixes it for you. Setting CSRF_TRUSTED_ORIGINS papers over the error by leaving Zulip thinking that requests are still coming in over an insecure connection, which could cause other problems.

fatk commented 1 year ago

X-Forwarded-Proto is correctly set and has been working since i've setup my Zulip instance (version 5). It's the update to version 7 that required using CSRF_TRUSTED_ORIGINS to resolve the "CSRF verification failed. Request aborted." problem.

alexmv commented 1 year ago

Ah -- the other half that changed in 7.0 is that we only trust the X-Forwarded-Proto if it is set by a trusted proxy. Do you have LOADBALANCER_IPS set in your docker container's environment to the IP address that Zulip sees requests as coming from?

You can check if that 'correctly set by looking at the nginx access logs, and seeing if it's the IP of the traefik gateway, or the end-user.

alexmv commented 1 year ago

See this comment on chat.zulip.org where someone else was configuring traefik requests to come from a static IP so Zulip can know to trust them.

fatk commented 1 year ago

I personally don't use traefik, i have my own solution based on nginx-proxy, maybe @chgayot can shed some light on that part. In any case, if there's a need to define LOADBALANCER_IPS for 7.0 to work properly it should be reflected in the documentation.

chgayot commented 1 year ago

Following the link and suggestion work, but I wasn't able to get it working on domain names rather than IPs, and I had to change the Traefik configuration which makes it a workaround rather than a production solution. It puts the entry barrier of docker-zulip really out of reach for beginners like me :(

For someone who knows more than me, there must be an init line or something similar we can run to get traefik's IP on container launch and set it up.

To sum up what I had to do:

In the TRAEFIK docker-compose:

networks:
    traefik:
        driver: bridge
        name: traefik
        ipam:
          config:
          - subnet: "172.18.0.0/24"** (maybe the right subnet is 16, never learnt this part)

AND

services:
  traefik:
...(traefik config)
    networks:
     - traefik:
        ipv4_address: 172.18.0.2

Finally, "just" set, in ZULIP docker-compose

  LOADBALANCER_IPS: "172.18.0.2"

No need for SETTING_CSRF_TRUSTED_ORIGINS.

But I'm all ear for the right solution!

tomkv commented 1 year ago

(continuing the discussion from #404, which is now closed)

The issue is not that X-Forwarded-Proto is not set, or not trusted. The issue is, that it is reset/ignored.

When using reverse proxy, the request goes trough two proxies:

1) the outer, user configured one

2) inside the docker container, there is another nginx, that finally forwards to zulip/django app.

In other words, there are three connections:

User agent <- 1 -> outer proxy <- 2 -> inner nginx <- 3 -> zulip.

When we configure the outer proxy, the X-Forwarded-Proto is being set to scheme used for connection 1.

However, the inner nginx sets it's own X-Forwarded-Proto based on the protocol of connection 2, ignoring the header paassed by downstream proxy.

That means, if the outer proxy talks to inner nginx via http and not https (which is sensible if both are on the same machine), the inner nginx will tell to the zulip app that X-Forwarded-Proto is http. Zulip can be accomodated for this with DISABLE_HTTPS=True; but as both this issue and #404 show, it will bubble up somewhere anyway.

As a workaround, configure the connection 2 to use https and remove DISABLE_HTTPS=True variable. This will make the inner nginx happy, though it might not be really secure; the connection 1 could still be http if misconfigured. It is also kind of a waste, talking on localhost with tls (and self-signed cert).

The loadbalancer.ips is not a fix; according to the documentation, it configures Zulip to trust X-Forwarded-For only if the request is from one of loadbalancer.ips; it has no provision for trusting the X-Forwarded-Proto.

P.S.: The X-Forwarded-Protocol mentioned in #404 comes from here: https://github.com/zulip/docker-zulip/wiki/Proxying-via-nginx-on-host-machine

alexmv commented 1 year ago

@chgayot wrote:

Following the link and suggestion work, but I wasn't able to get it working on domain names rather than IPs, and I had to change the Traefik configuration which makes it a workaround rather than a production solution. It puts the entry barrier of docker-zulip really out of reach for beginners like me :(

I'm not clear what you mean by "domain names rather than IPs" -- can you clarify? But changing the Traefik configuration seems necessary here, not a "workaround" -- nginx needs to know where the proxy is for it to be able to unroll X-Forwarded-For headers so that Zulip can log correct IP addresses. And that you managed to configure Traefik shows that it's out of reach for beginners like you. :)

For someone who knows more than me, there must be an init line or something similar we can run to get traefik's IP on container launch and set it up.

I'm not familiar with Traefik, but my belief is that if you don't set the IP explicitly, it may actually move around as Docker reallocates it if the container restarts. So Zulip getting the traefik IP address at boot time isn't necessarily sufficient, since it might later become stale.

Aside about "172.18.0.0/24"

> ``` > networks: > traefik: > driver: bridge > name: traefik > ipam: > config: > - subnet: "172.18.0.0/24"** (maybe the right subnet is 16, never learnt this part) > ``` In case it's helpful, `172.18.0.0/24` means `172.18.0.anything` and `172.18.0.0/16` means `172.18.anything.anything`. The whole of that private IP address block is [from `172.16.0.0` to `172.31.255.255`, which is described as `172.16.0.0/20`](https://en.wikipedia.org/wiki/Private_network#Private_IPv4_addresses). But I don't know enough about your Docker context to know if you actually want to specify the whole private subnet block.

One option, which might remove the need to set up a specific subnet and IP for traefik, is to tell Zulip that the loadbalancer IPs are 172.16.0.0/20, which is every IP address in that private subnet. Which means that any Docker container which can talk to Zulip is trusted, in that whatever it says for X-Forwarded-For will be believed, but that may be a reasonable assumption you can make because you control the network topology.

@tomkv wrote:

(continuing the discussion from #404, which is now closed)

When using reverse proxy, the request goes trough two proxies:

the outer, user configured one

inside the docker container, there is another nginx, that finally forwards to zulip/django app.

In other words, there are three connections:

User agent <- 1 -> outer proxy <- 2 -> inner nginx <- 3 -> zulip.

When we configure the outer proxy, the X-Forwarded-Proto is being set to scheme used for connection 1.

Agree with all of the above. Specifically, the request from the outer proxy to the inner nginx, at <- 2 ->, has X-Forwarded-Proto: https and X-Forwarded-For: ip-of-user-agent. An in your configuration, <- 1 -> is over HTTPS, and <- 2 -> is over HTTP (<- 3 -> is over usgi binary protocol on a UNIX domain socket).

However, the inner nginx sets it's own X-Forwarded-Proto based on the protocol of connection 2, ignoring the header passed by downstream proxy.

Yup, that's absolutely what a naïve implementation would do (and what Zulip used to do!) -- but we explicitly set the X-Forwarded-Proto header in connection <- 3 ->, to the Django process, to the incoming value of X-Forwarded-Proto if it's coming from a known trusted IP. See zulip/zulip@0935d388f05337e73898d932ddb75b458cf4747e, and this code: https://github.com/zulip/zulip/blob/0935d388f05337e73898d932ddb75b458cf4747e/puppet/zulip/files/nginx/zulip-include-common/proxy#L6 https://github.com/zulip/zulip/blob/0935d388f05337e73898d932ddb75b458cf4747e/puppet/zulip/templates/nginx/trusted-proto.template.erb

This is specifically a change in 7.0, and probably one of the causes of the problems folks are seeing.

Have you tried setting LOADBALANCER_IPS to the outer proxy's IP? If that didn't work, then there may be something else afoot here, but as far as I can tell, that's the most correct solution still.

I'll work up a documentation and/or changelog change to highlight the need to have the loadbalancer IPs set in 7.0, since this issue is very much showing that we need to improve that. :)

The loadbalancer.ips is not a fix; according to the documentation, it configures Zulip to trust X-Forwarded-For only if the request is from one of loadbalancer.ips; it has no provision for trusting the X-Forwarded-Proto.

That's absolutely an error in the documentation, and I'll push a fix. Apologies for any confusion that caused!

P.S.: The X-Forwarded-Protocol mentioned in #404 comes from here: https://github.com/zulip/docker-zulip/wiki/Proxying-via-nginx-on-host-machine

Thanks -- I've updated to point to the canonical documentation.

tomkv commented 1 year ago

@alexmv wrote:

Have you tried setting LOADBALANCER_IPS to the outer proxy's IP? If that didn't work, then there may be something else afoot here, but as far as I can tell, that's the most correct solution still.

After some investigation, it turnet out, that exactly this has been the issue:

1) LOADBALANCER_IPS must to be used as an env var in docker-compose.yml; even if using MANUAL_CONFIGURATION: "True" and LINK_SETTINGS_TO_DATA: "True". Any manual configuration of loadbalancer.ips in zulip.conf will be ignored.

In fact, the README.md says:

If you do that, you can provide a settings.py file and a zulip-secrets.conf file in /opt/docker/zulip/zulip/settings/etc-zulip/, and the container will use those.

It does not mention zulip.conf as one of mapped files.

2) The IP in LOADBALANCER_IPS must contain the IPs of docker network. 127.0.0.1 is not enough, not even if your proxy is running on the host (not in another container) and you configured proxy_pass http://127.0.0.1:exposed-port;.

This is a bit of an issue with docker-compose; by default, it creates a new network project_default on every docker-compose up and deletes it on every docker-compose down. On every creation, it gets a new subnet from the pool. To prevent that, configuring specific subnet in docker-compose.yml is necessary:

networks:
  default:
    ipam:
      driver: default
      config:
        - subnet: ${ZULIP_SUBNET}

...

services:
  zulip:
    ...
    environment:
      LOADBALANCER_IPS: "127.0.0.1, ${ZULIP_SUBNET}"
   ...

(and put ZULIP_SUBNET=... into .env-file).

After this, it works.

alexmv commented 1 year ago

LOADBALANCER_IPS must to be used as an env var in docker-compose.yml; even if using MANUAL_CONFIGURATION: "True" and LINK_SETTINGS_TO_DATA: "True". Any manual configuration of loadbalancer.ips in zulip.conf will be ignored. In fact, the README.md says:

If you do that, you can provide a settings.py file and a zulip-secrets.conf file in /opt/docker/zulip/zulip/settings/etc-zulip/, and the container will use those.

It does not mention zulip.conf as one of mapped files.

Mmm, I can see this being unclear. /etc/zulip/settings.py is only for things inside Django and python in general -- deployment settings which control how other binaries (e.g. nginx) run are in /etc/zulip/zulip.conf. Docker doesn't have much call to change zulip.conf, hence why the couple things which might reuire changes there (DISABLE_HTTPS and LOADBALANCER_IPS, primarily) have their own top-level env settings and MANUAL_CONFIGURATION does not apply.

The IP in LOADBALANCER_IPS must contain the IPs of docker network. 127.0.0.1 is not enough, not even if your proxy is running on the host (not in another container) and you configured proxy_pass http://127.0.0.1:exposed-port;.

Yup -- 127.0.0.1 doesn't make sense as a proxy IP unless you literally installed another nginx inside the Docker container itself. As you note, Docker makes a private, imaginary, network on 172.16.0.0/20, and does NAT to proxy things in and out of it. Roughly:

Requests come into the host on 1.2.3.4:443
The nginx proxy listening there proxies to 127.0.0.1:8080 -- so it forwards the packet of the the lo loopback adapter.
Docker is listening there, captures those packats, and does NAT to rewrite those packets as being "from" 172.18.0.2; it rewrites the "to" address from 127.0.0.1 to 172.18.0.3 (or whatever internal IP Docker has assigned the Zulip container), and the destination port from 8080 to 80.
Docker sends those packets to the Zulip container, where Zulip's nginx serves them. It needs to know to trust 172.18.0.2 -- to the Docker container, those packets are not from localhost, or the lo loopback adapter, at all!
When Zulip sends a reply back "to" 172.18.0.2 on its imaginary network adapter on its virtual network, Docker does the reverse translation to look up where the request came from originally, and responds back over the lo loopback adapter to nginx, who sees the packets as just being "from" 127.0.0.1.
The outer nginx proxy sends its reply to the originating client, which sees the packets as "from" 1.2.3.4.

I'm glad you got things working. Can you take a look at #405 and zulip/zulip#26011 and see if the changes in those would have helped make this more evident?

zulip / docker-zulip

CSRF verification failed if behind a reverse proxy (traefik) #403