Cannot change ingress container from http to tcp (or vice versa) when using Consul Service Mesh

Nomad version

Nomad v1.3.5 (1359c2580fed080295840fb888e28f0855e42d50)

Operating system and Environment details

Ubuntu 22.04 on AWS (on a fresh EC2 instance), amd64

Consul v1.13.2 Revision 0e046bbb Build Date 2022-09-20T20:30:07Z Protocol 2 spoken by default, understands 2 to 3 (agent will automatically use protocol >2 when speaking to compatible agents)

Docker version 20.10.18, build b40c2f6

Issue

If I run an ingress container with the http protocol, I'm unable to edit it to use tcp even after I stop the job. Even if I run nomad system gc and nomad system reconcile summaries, it still doesn't work. I'm also unable to edit the consul config to use

If I swap all instances of http and tcp I get the same errors.

Reproduction steps

Start nomad/consul in dev mode:

consul agent -dev
sudo nomad agent -dev-connect

Set up consul to use http as default protocol (using proxy-defaults.hcl file below)
```
consul config write proxy-defaults.hcl
```
Run the first job file
```
nomad job run job1.nomad
```
After job has started, stop the job
```
nomad job stop job1
```
When job stops successfully, run the second job file
```
nomad job run job2.nomad
```

Expected Result

I should be able to run job2 as normal.

Actual Result

$ nomad job run job2.nomad 
Error submitting job: Unexpected response code: 500 (Unexpected response code: 500 (service "test-upstream" has protocol "http", which does not match defined listener protocol "tcp"))
$ consul config write service-defaults.hcl 
Error writing config entry service-defaults/test-upstream: Unexpected response code: 500 (service "test-upstream" has protocol "tcp", which does not match defined listener protocol "http")

Job file (if appropriate)

proxy-defaults.hcl

Kind      = "proxy-defaults"
Name      = "global"
Config {
  protocol = "http"
}

service-defaults.hcl

Kind     = "service-defaults"
Name     = "test-upstream"
Protocol = "tcp"

job1.nomad:

job "job1" {
    region = "global"
    datacenters = ["dc1"]
    type = "system"

    group "group1" {
        network {
            mode = "bridge"
            port "default" {
                static = 12345
                to = 12345
            }
        }
        service {
            name = "test-ingress"
            port = "12345"
            connect {
                gateway {
                    proxy {
                        connect_timeout = "5s"
                    }
                    ingress {
                        listener {
                            port = 12345
                            protocol = "http"
                            service {
                                name = "test-upstream"
                                hosts = ["*"]
                            }
                        }
                    }
                }
            }
        }
    }
}

job2.nomad:

job "job2" {
    region = "global"
    datacenters = ["dc1"]
    type = "system"

    group "group2" {
        network {
            mode = "bridge"
            port "default" {
                static = 12345
                to = 12345
            }
        }
        service {
            name = "test-ingress"
            port = "12345"
            connect {
                gateway {
                    proxy {
                        connect_timeout = "5s"
                    }
                    ingress {
                        listener {
                            port = 12345
                            protocol = "tcp"
                            service {
                                name = "test-upstream"
                            }
                        }
                    }
                }
            }
        }
    }
}

Hi @brian-athinkingape! I was able to reproduce the behavior you're seeing exactly. Thank you so much for providing a solid minimal example, it really helps a lot! The tl;dr is that you've hit a known design issue between Consul and Nomad around gateways, which is described by my colleague @shoenig in https://github.com/hashicorp/nomad/issues/8647#issuecomment-691290660

There's a workaround roughly described in https://github.com/hashicorp/consul/issues/10308#issuecomment-849713211. I'm going to show that workaround first and then get into the nitty-gritty of why this is happening below.

Workaround

Read the current kind=ingress-gateway config to a file, and then remove the listener:

$ consul config read -kind ingress-gateway -name test-ingress > ./ingress.json

Transform this into:

{
    "Kind": "ingress-gateway",
    "Name": "test-ingress",
    "TLS": {
        "Enabled": false
    }
}

Write the new config and delete the kind=proxy-defaults config:

$ consul config write ./ingress.json
Config entry written: ingress-gateway/test-ingress

$ consul config delete -kind proxy-defaults -name global
Config entry deleted: proxy-defaults/global

Now the second job works:

$ nomad job run ./job2.nomad
==> 2022-10-05T11:13:04-04:00: Monitoring evaluation "7ecf8803"
    2022-10-05T11:13:04-04:00: Evaluation triggered by job "job2"
    2022-10-05T11:13:04-04:00: Allocation "fdef5d8f" created: node "35be55c7", group "group2"
    2022-10-05T11:13:04-04:00: Evaluation status changed: "pending" -> "complete"
==> 2022-10-05T11:13:04-04:00: Evaluation "7ecf8803" finished with status "complete"

Reproduction

Running job2 hits the error you reported:

$ nomad job run ./job2.nomad
Error submitting job: Unexpected response code: 500 (Unexpected response code: 500 (service "test-upstream" has protocol "http", which does not match defined listener protocol "tcp"))

A clue to what's going on is that job2 isn't registered at all, which means that it's happening in the initial job submission and not part of allocation setup after we've scheduled the workload. That narrows down the behavior to this block job_endpoint.go#L249-L272 in the Job.Register RPC, which writes a configuration to Consul. (I'm also seeing that Sentinel policy enforcement is happening after we've done that, which seems backwards, but I'll address that elsewhere.)

I was a little confused by why we'd be doing this in the job register code path at all and not on the client node after an allocation is placed, but then I did some digging and found this comment https://github.com/hashicorp/nomad/issues/8647#issuecomment-691290660 from my colleague @shoenig which discusses the "multi-writer" problem we have. Ultimately Consul owns the configuration entry and it's global, so multiple Nomad clusters could be writing to it.

One way to imagine the problem is to consider what would happen if you ran both job1 and job2 at the same time! We wouldn't have any way of updating Consul correctly in this case.

So ultimately this issue is a duplicate of #8647 and something we need to fix, which I realize isn't very satisfying in the short term.

A challenging part of figuring out what to do as an operator is that the Consul CLI and UI isn't super clear on the data you need. The ingress gateway isn't exposed in the consul catalog CLI at all. So it took me a little while to find https://github.com/hashicorp/consul/issues/10308 and develop the workaround described above.

Although this is technically a duplicate there could be unique bits to it. I'm going to keep this open and mark it for roadmapping, and crosslink to it from #8647.

Thanks, we used the workaround to resolve the issue on our production system for now, looking forward to when this can be resolved!

hashicorp / nomad