infracloudio / ifc-rw-codecollection

test codebundle for runwhen
Apache License 2.0
0 stars 0 forks source link

ro cli not able to resolve/connect internal address #13

Open saurabh3460 opened 8 months ago

saurabh3460 commented 8 months ago

Issue

9090 Port-forwarded not accessible from inside container on --network="host":

To reproduce we will port-forward the promethues using kubectl port-forward svc/prometheus-stack-kube-prom-prometheus 9090:9090 -n monitoring then build and run the devtool container:

docker run --rm -d -p 3000:3000 --name rds-codecollection \
--network="host" \
-e PROMETHEUS_URL="http://localhost:9090/api/v1" \
-e QUERY="aws_rds_database_connections_average{dimension_DBInstanceIdentifier=\"robotshopmysql\"} > 1" \
-e RW_PATH_TO_ROBOT="/app/codecollection/codebundles/rds-mysql-conn-count/runbook.robot" \
runwhen:latest

On running sli.robot and we can see it's not able to connect to localhost!

➜  ifc-rw-codecollection git:(fix/rds) docker exec rds-codecollection bash -c "ro /app/codecollection/codebundles/rds-mysql-conn-count/sli.robot && ls -R /robot_logs"
==============================================================================
Sli :: Run a PromQL query against Prometheus instant query API, perform a p...
==============================================================================
Querying Prometheus Instance And Pushing Aggregated Data              | FAIL |
ValueError: Recieved return code of 7 from response ShellServiceResponse(cmd='eval $(echo "curl -X GET \'http://localhost:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T10%3A44%3A24.699464Z&step=30\'")', parsed_cmd=['rbash', '-c', 'eval $(echo "curl -X GET \'http://localhost:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T10%3A44%3A24.699464Z&step=30\'")'], stdout='', stderr='  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\n  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Failed to connect to localhost port 9090: Connection refused\n', returncode=7, status=200, body='', errors=[])

checked with bare curl

To confirm that our port-forwarded promethues is reachable from inside the container we will run curl from inside container using above query which failed and we can see it succeeded.

python@saurabh:/app$ curl http://localhost:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T10%3A44%3A24.699464Z&step=30
[1] 26
[2] 27
python@saurabh:/app$ {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"aws_rds_database_connections_average","account_id":"590183940259","container":"yet-another-cloudwatch-exporter","dimension_DBInstanceIdentifier":"robotshopmysql","endpoint":"http","instance":"192.168.111.212:5000","job":"yace","name":"arn:aws:rds:us-west-2:590183940259:db:robotshopmysql","namespace":"monitoring","pod":"yace-8b8bd5598-shtvb","region":"us-west-2","service":"yace"},"value":[1707130056.217,"30"]}]}}
[1]-  Done                    curl http://localhost:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201
[2]+  Done                    time=2024-02-05T10%3A44%3A24.699464Z
python@saurabh:/app$

checked with rbash eval so we thought may be there will be issue with rbash eval so to confirm let's try above query with rbash eval. This also works.

python@saurabh:/app$ rbash -c 'eval $(echo "curl -X GET http://localhost:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T10%3A44%3A24.699464Z&step=30")'
python@saurabh:/app$ {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"aws_rds_database_connections_average","account_id":"590183940259","container":"yet-another-cloudwatch-exporter","dimension_DBInstanceIdentifier":"robotshopmysql","endpoint":"http","instance":"192.168.111.212:5000","job":"yace","name":"arn:aws:rds:us-west-2:590183940259:db:robotshopmysql","namespace":"monitoring","pod":"yace-8b8bd5598-shtvb","region":"us-west-2","service":"yace"},"value":[1707132134.317,"30"]}]}}
python@saurabh:/app$

SLI K8s deployment not able to resolve internal dns:

Running devtool container inside k8s so that we can access the PROMETHEUS_URL using internal endpoint http://prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local:9090,

Note change url in sli-deployment.yaml

After deployment we run kubectl exec deploy/rds-mysql-connection-count-sli -n runwhen -- ro /app/codecollection/codebundles/rds-mysql-conn-count/sli.robot which gives Could not resolve host error

Querying Prometheus Instance And Pushing Aggregated Data              | FAIL |
ValueError: Recieved return code of 6 from response ShellServiceResponse(cmd='eval $(echo "curl -X GET \'http://prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T11%3A42%3A00.169554Z&step=30\'")', parsed_cmd=['rbash', '-c', 'eval $(echo "curl -X GET \'http://prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T11%3A42%3A00.169554Z&step=30\'")'], stdout='', stderr='  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\n  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local\n', returncode=6, status=200, body='', errors=[])

checked with bare curl

exec inside pod and run the same failing query again with curl this time which seems working

python@rds-mysql-connection-count-sli-7bf96d97bd-kfthd:/app$ curl http://prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T11%3A42%3A00.169554Z&step=30
[1] 30
[2] 31
python@rds-mysql-connection-count-sli-7bf96d97bd-kfthd:/app$ {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"aws_rds_database_connections_average","account_id":"590183940259","container":"yet-another-cloudwatch-exporter","dimension_DBInstanceIdentifier":"robotshopmysql","endpoint":"http","instance":"192.168.111.212:5000","job":"yace","name":"arn:aws:rds:us-west-2:590183940259:db:robotshopmysql","namespace":"monitoring","pod":"yace-8b8bd5598-shtvb","region":"us-west-2","service":"yace"},"value":[1707133955.551,"30"]}]}}
[1]-  Done                    curl http://prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201
[2]+  Done                    time=2024-02-05T11%3A42%3A00.169554Z

checked with rbash eval

python@rds-mysql-connection-count-sli-7bf96d97bd-kfthd:/app$ rbash -c 'eval $(echo "curl -X GET http://prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local:9090/api/v1/query?query=aws_rds_database_connections_average%7Bdimension_DBInstanceIdentifier%3D%22robotshopmysql%22%7D%20%3E%201&time=2024-02-05T11%3A42%3A00.169554Z&step=30")'
python@rds-mysql-connection-count-sli-7bf96d97bd-kfthd:/app$ {"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"aws_rds_database_connections_average","account_id":"590183940259","container":"yet-another-cloudwatch-exporter","dimension_DBInstanceIdentifier":"robotshopmysql","endpoint":"http","instance":"192.168.111.212:5000","job":"yace","name":"arn:aws:rds:us-west-2:590183940259:db:robotshopmysql","namespace":"monitoring","pod":"yace-8b8bd5598-shtvb","region":"us-west-2","service":"yace"},"value":[1707134382.617,"30"]}]}}

check if the port is open on prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local from inside pod using netcat util

python@rds-mysql-connection-count-sli-7bf96d97bd-kfthd:/app$ nc -v prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local 9090
prometheus-stack-kube-prom-prometheus.monitoring.svc.cluster.local [10.100.30.208] 9090 (?) open

HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close

Check the /etc/resolv.conf

python@rds-mysql-connection-count-sli-7bf96d97bd-kfthd:/app$ cat /etc/resolv.conf
search runwhen.svc.cluster.local svc.cluster.local cluster.local us-west-2.compute.internal
nameserver 10.100.0.10
options ndots:5