microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
362 stars 29 forks source link

[KEDA][Kafka] DNS resolution not working with a custom DNS server #970

Open pgourlain opened 10 months ago

pgourlain commented 10 months ago

This issue is a: (mark with an x)

Issue description

We use custom kafka KEDA rule on a pod, but the dns resolution doesn't work as expected. Our Kafka cluster is linked to a private endpoint. So the dns resolution inside a pod is xxxx.westeurope.azure.confluent.cloud => X.240.Y.Z and from KEDA xxxx.westeurope.azure.confluent.cloud =>10.1.0.4

our scaling rule :

"scale": {
    "minReplicas": 1,
    "maxReplicas": 6,
    "rules": [
        {
            "name": "kafka-rule",
            "custom": {
                "type": "kafka",
                "metadata": {
                    "bootstrapServers": "xxxx.westeurope.azure.confluent.cloud:9092",
                    "consumerGroup": "myconsumergroup",
                    "lagThreshold": "20",
                    "offsetResetPolicy": "earliest",
                    "sasl": "plaintext",
                    "tls": "enable",
                    "topic": "mytopic"
                },
                "auth": [
                    {
                        "secretRef": "kafka-keda-username",
                        "triggerParameter": "username"
                    },
                    {
                        "secretRef": "kafka-keda-password",
                        "triggerParameter": "password"
                    }
                ]
            }
        }
    ]
},

note that the kafka cluster url has a public resolution => 10.1.0.4, but in our network is X.240.Y.Z.

Steps to reproduce

1) use confluent KAfka cluster with private connectivity 2) create Azure VNET and set DNS servers to custom 3) Create Container App Environment in subnet of previous vnet (Workload profile is used) 4) Create a Container App that consume Kafka (Dapr is not used) 5) Setup scaling rule as above

Expected behavior [What you expected to happen.]

KEDA should resolve url 'xxxx.westeurope.azure.confluent.cloud' to X.240.Y.Z

Actual behavior [What actually happened.]

KEDA resolve url 'xxxx.westeurope.azure.confluent.cloud' to 10.1.0.4, so scaling doesn't works

Screenshots
If applicable, add screenshots to help explain your problem. extacted from system logs : image

VNET DNS configuration : image

workload profile configuration : image

Additional context

stweb1963 commented 10 months ago

We are seeing similar issues

Issue

Our implementation

Observations/Issues

Band-Aid solution / resolution

End result / outcome

Issues Still Observed

Although all appeared to be ok for now, we uncovered one last issue that is blocking the scaling

cachai2 commented 9 months ago

Hi @pgourlain and @stweb1963, we identified a bug and should have a fix out by end of January.

mohandoukaci commented 7 months ago

Hello, do we have any news about this issue? if it's fixed, do we need to redeploy our Container App environnement?