litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.45k stars 698 forks source link

remote host agent deploy #2860

Open ntlzthm8 opened 3 years ago

ntlzthm8 commented 3 years ago

Hi! 1) I'am installed litmus and portal on minikube on one host, with ip x.x.x.x

2) i'am deployed shop-sock on minikube on other host, with ip y.y.y.y

📶 Please enter LitmusChaos details -- 👉 Host URL where litmus is installed: http://x.x.x.x:8080/ 🤔 Username [admin]: admin 🙈 Password: ✅ Login Successful!

✨ Projects List:

  1. admin's project

🔎 Select Project: 1

🔌 Installation Modes:

  1. Cluster
  2. Namespace

👉 Select Mode [cluster]: 1

🏃 Running prerequisites check.... 🔑 clusterrole - ✅ 🔑 clusterrolebinding - ✅

🌟 Sufficient permissions. Connecting Agent

🔗 Enter the details of the agent ---- 🤷 Agent Name: sockshop 📘 Agent Description: sockshop tests 📦 Platform List

  1. AWS
  2. GKE
  3. Openshift
  4. Rancher
  5. Others 🔎 Select Platform [Others]: 5 📁 Enter the namespace (new or existing) [litmus]: litmus 🔑 Enter service account [litmus]: litmus

📌 Summary --------------------------

Agent Name: sockshop Agent Description: sockshop tests Platform Name: Others Namespace: litmus (new) Service Account: litmus (new) Installation Mode: cluster


🤷 Do you want to continue with the above details? [Y/N]: Y 👍 Continuing agent connection!! Applying YAML: http://x.x.x.x:8080/api/file/eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJjbHVzdGVyX2lkIjoiOGIxMzYwN2UtMGE2NC00MzY0LWI2YjQtZjY5OTZkMzNjZTIzIn0.9wsFC-kbQEgEa0t40z3tZdUagwR3R5496_xXY0kHQwo.yaml

namespace/litmus created serviceaccount/litmus created Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io created customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io created customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io created customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io created serviceaccount/argo created serviceaccount/argo-server created clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin created clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit created clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view created clusterrole.rbac.authorization.k8s.io/argo-cluster-role created clusterrole.rbac.authorization.k8s.io/argo-server-cluster-role created clusterrolebinding.rbac.authorization.k8s.io/argo-binding created clusterrolebinding.rbac.authorization.k8s.io/argo-server-binding created configmap/workflow-controller-configmap created service/argo-server created service/workflow-controller-metrics created deployment.apps/argo-server created deployment.apps/workflow-controller created customresourcedefinition.apiextensions.k8s.io/chaosengines.litmuschaos.io created customresourcedefinition.apiextensions.k8s.io/chaosexperiments.litmuschaos.io created customresourcedefinition.apiextensions.k8s.io/chaosresults.litmuschaos.io created customresourcedefinition.apiextensions.k8s.io/eventtrackerpolicies.eventtracker.litmuschaos.io created serviceaccount/litmus-cluster-scope created clusterrole.rbac.authorization.k8s.io/litmus-cluster-scope created clusterrolebinding.rbac.authorization.k8s.io/litmus-cluster-scope created deployment.apps/chaos-operator-ce created deployment.apps/chaos-exporter created service/chaos-exporter created serviceaccount/litmus-admin created clusterrole.rbac.authorization.k8s.io/litmus-admin created clusterrolebinding.rbac.authorization.k8s.io/litmus-admin created serviceaccount/argo-chaos created clusterrole.rbac.authorization.k8s.io/chaos-cluster-role created clusterrolebinding.rbac.authorization.k8s.io/chaos-cluster-role-binding created Warning: rbac.authorization.k8s.io/v1beta1 ClusterRole is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRole clusterrole.rbac.authorization.k8s.io/subscriber-cluster-role created Warning: rbac.authorization.k8s.io/v1beta1 ClusterRoleBinding is deprecated in v1.17+, unavailable in v1.22+; use rbac.authorization.k8s.io/v1 ClusterRoleBinding clusterrolebinding.rbac.authorization.k8s.io/subscriber-cluster-role-binding created serviceaccount/event-tracker-sa created clusterrole.rbac.authorization.k8s.io/event-tracker-cluster-role created clusterrolebinding.rbac.authorization.k8s.io/event-tracker-clusterole-binding created configmap/agent-config created deployment.apps/subscriber created deployment.apps/event-tracker created

💡 Connecting agent to Litmus Portal. 💡 Connecting agent to Litmus Portal. 💡 Connecting agent to Litmus Portal. 🏃 Agents running!!

🚀 Agent Connection Successful!! 🎉 👉 Litmus agents can be accessed here: http://x.x.x.x:8080/targets


but have error
kubectl pods:
`litmus                 subscriber-958948965-z8qcm                   0/1     CrashLoopBackOff   5          4m54s`

Events:

Events: Type Reason Age From Message


Normal Scheduled 5m28s default-scheduler Successfully assigned litmus/subscriber-958948965-z8qcm to minikube Normal Pulled 5m5s kubelet Successfully pulled image "litmuschaos/litmusportal-subscriber:2.0.0-Beta7" in 22.095972847s Normal Pulled 4m32s kubelet Successfully pulled image "litmuschaos/litmusportal-subscriber:2.0.0-Beta7" in 1.576215013s Normal Pulled 4m17s kubelet Successfully pulled image "litmuschaos/litmusportal-subscriber:2.0.0-Beta7" in 1.562525358s Normal Pulled 3m51s kubelet Successfully pulled image "litmuschaos/litmusportal-subscriber:2.0.0-Beta7" in 1.66723591s Normal Created 3m50s (x4 over 5m4s) kubelet Created container subscriber Normal Started 3m50s (x4 over 5m4s) kubelet Started container subscriber Normal Pulling 3m2s (x5 over 5m27s) kubelet Pulling image "litmuschaos/litmusportal-subscriber:2.0.0-Beta7" Warning BackOff 17s (x21 over 4m31s) kubelet Back-off restarting failed container


docker logs

{"log":"2021/06/02 14:09:33 Go Version: go1.14.15\n","stream":"stderr","time":"2021-06-02T14:09:33.100273004Z"} {"log":"2021/06/02 14:09:33 Go OS/Arch: linux/amd64\n","stream":"stderr","time":"2021-06-02T14:09:33.100311248Z"} {"log":"time=\"2021-06-02T14:09:33Z\" level=info msg=\"all deployments up\"\n","stream":"stderr","time":"2021-06-02T14:09:33.125661198Z"} {"log":"time=\"2021-06-02T14:09:33Z\" level=info msg=\"all components live...starting up subscriber\"\n","stream":"stderr","time":"2021-06-02T14:09:33.125686362Z"} {"log":"2021/06/02 14:09:33 Post \"http://192.168.49.2:31769/query\": dial tcp 192.168.49.2:31769: connect: connection refused\n","stream":"stderr","time":"2021-06-02T14:09:33.130219724Z"}


Agent state in portal - Pending.
Why agent try connect to local minikube ip `192.168.49.2:31769`???
imrajdas commented 3 years ago

Hi @TidalPoo , Subscriber pod always connected with litmusportal-server pod by svc/ingress. Looks like 192.168.49.2 is the minikube IP and 31769 is the port of litmusportal-server.

As per the logs

{"log":"2021/06/02 14:09:33 Post \"http://192.168.49.2:31769/query\": dial tcp 192.168.49.2:31769: connect: connection refused\n","stream":"stderr","time":"2021-06-02T14:09:33.130219724Z"}

Subscriber is not able to connect with the server pod. Can you check the pod status and also the status code of this endpoint http://192.168.49.2:31769/query?

ntlzthm8 commented 3 years ago

i have not service on 31769 on agent host (it's remote host)

litmusportal-server on one physical host with external ip x.x.x.x + minikub internal ip 192.168.49.2

[centos@vm-1 ~]$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                     READY   STATUS    RESTARTS   AGE
kube-system   coredns-74ff55c5b-rnrkl                  1/1     Running   1          2d
kube-system   etcd-minikube                            1/1     Running   1          2d
kube-system   kube-apiserver-minikube                  1/1     Running   1          2d
kube-system   kube-controller-manager-minikube         1/1     Running   1          2d
kube-system   kube-proxy-z4hqz                         1/1     Running   1          2d
kube-system   kube-scheduler-minikube                  1/1     Running   1          2d
kube-system   storage-provisioner                      1/1     Running   3          2d
litmus        argo-server-58cb64db7f-wfn84             1/1     Running   2          2d
litmus        chaos-exporter-547b59d887-twqjz          1/1     Running   1          2d
litmus        chaos-operator-ce-5ffd8d8c8b-sfhl9       1/1     Running   1          2d
litmus        event-tracker-5bc478cbd7-p2zg7           1/1     Running   1          2d
litmus        litmusportal-frontend-698bcb686f-7djzn   1/1     Running   2          2d
litmus        litmusportal-server-5bb94f65d7-bp2xk     2/2     Running   4          2d
litmus        mongo-0                                  1/1     Running   1          2d
litmus        subscriber-958948965-8g252               1/1     Running   1          2d
litmus        workflow-controller-78fc7b6c6-f9s5z      1/1     Running   2          2d
[centos@vm-1 ~]$ kubectl get service --all-namespaces
NAMESPACE     NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                         AGE
default       kubernetes                      ClusterIP   10.96.0.1        <none>        443/TCP                         2d
kube-system   kube-dns                        ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP          2d
litmus        argo-server                     ClusterIP   10.98.83.15      <none>        2746/TCP                        2d
litmus        chaos-exporter                  ClusterIP   10.107.188.123   <none>        8080/TCP                        2d
litmus        chaos-operator-metrics          ClusterIP   10.111.204.160   <none>        8383/TCP                        2d
litmus        litmusportal-frontend-service   NodePort    10.103.134.143   <none>        9091:31391/TCP                  2d
litmus        litmusportal-server-service     NodePort    10.108.31.80     <none>        9002:31769/TCP,9003:31119/TCP   2d
litmus        mongo-service                   ClusterIP   10.96.152.225    <none>        27017/TCP                       2d
litmus        workflow-controller-metrics     ClusterIP   10.108.242.255   <none>        9090/TCP                        2d
[centos@vm-1 ~]$ minikube ip
192.168.49.2

litmus agent (and subscriber) on another physical host with ip y.y.y.y + minikub internal ip 192.168.49.2

[centos@vm-2 ~]$ kubectl get pods --all-namespaces
NAMESPACE              NAME                                         READY   STATUS             RESTARTS   AGE
kube-system            coredns-74ff55c5b-ssgnh                      1/1     Running            2          47h
kube-system            elasticsearch-69d8bb74cf-pf8jn               1/1     Running            9          44h
kube-system            etcd-minikube                                1/1     Running            3          47h
kube-system            fluentd-5sflx                                1/1     Running            2          44h
kube-system            kibana-5cffbc9dd8-6mmmw                      1/1     Running            2          44h
kube-system            kube-apiserver-minikube                      1/1     Running            3          47h
kube-system            kube-controller-manager-minikube             1/1     Running            2          47h
kube-system            kube-proxy-5twfg                             1/1     Running            2          47h
kube-system            kube-scheduler-minikube                      1/1     Running            2          47h
kube-system            storage-provisioner                          1/1     Running            7          47h
kubernetes-dashboard   dashboard-metrics-scraper-79c5968bdc-w645v   1/1     Running            2          46h
kubernetes-dashboard   kubernetes-dashboard-9f9799597-q4p4n         1/1     Running            2          46h
litmus                 argo-server-58cb64db7f-7glt9                 1/1     Running            1          41h
litmus                 chaos-exporter-547b59d887-dwx4j              1/1     Running            1          41h
litmus                 chaos-operator-ce-84ddc8f5d7-wc78f           1/1     Running            1          41h
litmus                 event-tracker-5bc478cbd7-hwfjj               1/1     Running            1          41h
litmus                 subscriber-958948965-z8qcm                   0/1     CrashLoopBackOff   19         41h
litmus                 workflow-controller-78fc7b6c6-s6vjb          1/1     Running            1          41h
sock-shop              carts-6964f6766-8ct5r                        0/1     Running            1          43h
sock-shop              carts-db-6c6c68b747-h5nhr                    1/1     Running            1          43h
sock-shop              catalogue-86f6f4974d-6qgkv                   1/1     Running            1          43h
sock-shop              catalogue-db-96f6f6b4c-7xclc                 1/1     Running            1          43h
sock-shop              front-end-6649c54d45-lww9j                   1/1     Running            1          43h
sock-shop              orders-8499f8685f-7kg6x                      1/1     Running            1          43h
sock-shop              orders-db-659949975f-gctsc                   1/1     Running            1          43h
sock-shop              payment-64f6994975-xgw8c                     1/1     Running            1          43h
sock-shop              queue-master-8674c86c8-pwtbc                 1/1     Running            1          43h
sock-shop              rabbitmq-5bcbb547d7-nwhs7                    2/2     Running            2          43h
sock-shop              session-db-7cf97f8d4f-wmvm2                  1/1     Running            1          43h
sock-shop              shipping-978dc7cf9-z9r5h                     0/1     Running            1          43h
sock-shop              user-7476fdf495-rqz8z                        1/1     Running            1          43h
sock-shop              user-db-6df7444fc-x575b                      1/1     Running            1          43h
[centos@vm-2 ~]$ kubectl get service --all-namespaces
NAMESPACE              NAME                          TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default                kubernetes                    ClusterIP      10.96.0.1        <none>        443/TCP                  47h
kube-system            elasticsearch                 ClusterIP      10.96.208.3      <none>        9200/TCP                 44h
kube-system            kibana                        NodePort       10.104.128.132   <none>        5601:31601/TCP           44h
kube-system            kube-dns                      ClusterIP      10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   47h
kubernetes-dashboard   dashboard-metrics-scraper     ClusterIP      10.104.10.150    <none>        8000/TCP                 46h
kubernetes-dashboard   kubernetes-dashboard          ClusterIP      10.107.139.159   <none>        443/TCP                  46h
litmus                 argo-server                   ClusterIP      10.104.2.245     <none>        2746/TCP                 41h
litmus                 chaos-exporter                ClusterIP      10.98.78.30      <none>        8080/TCP                 41h
litmus                 chaos-operator-metrics        ClusterIP      10.96.143.156    <none>        8383/TCP                 41h
litmus                 workflow-controller-metrics   ClusterIP      10.108.226.202   <none>        9090/TCP                 41h
sock-shop              carts                         ClusterIP      10.108.192.49    <none>        80/TCP                   43h
sock-shop              carts-db                      ClusterIP      10.109.113.135   <none>        27017/TCP                43h
sock-shop              catalogue                     ClusterIP      10.111.97.250    <none>        80/TCP                   43h
sock-shop              catalogue-db                  ClusterIP      10.102.154.133   <none>        3306/TCP                 43h
sock-shop              front-end                     LoadBalancer   10.111.150.214   <pending>     80:30001/TCP             43h
sock-shop              orders                        ClusterIP      10.101.232.182   <none>        80/TCP                   43h
sock-shop              orders-db                     ClusterIP      10.97.148.15     <none>        27017/TCP                43h
sock-shop              payment                       ClusterIP      10.108.52.195    <none>        80/TCP                   43h
sock-shop              queue-master                  ClusterIP      10.111.43.208    <none>        80/TCP                   43h
sock-shop              rabbitmq                      ClusterIP      10.98.84.18      <none>        5672/TCP,9090/TCP        43h
sock-shop              session-db                    ClusterIP      10.102.247.147   <none>        6379/TCP                 43h
sock-shop              shipping                      ClusterIP      10.99.40.54      <none>        80/TCP                   43h
sock-shop              user                          ClusterIP      10.110.121.188   <none>        80/TCP                   43h
sock-shop              user-db                       ClusterIP      10.98.222.124    <none>        27017/TCP                43h
[centos@vm-2 ~]$ minikube ip
192.168.49.2
imrajdas commented 3 years ago

Hi @TidalPoo, To connect the agent host to the control plane, the IP and port of the litmusportal server should be accessible by the agent host. In your case agent host is not able to access the control-plane host. It's not possible to connect one minikube cluster to another minikube cluster

ntlzthm8 commented 3 years ago

Did you read what i'am writing early? I'am created proxy_pass rules external port x.x.x.x:8080 to internal port minikub.ip:31769 by nginx on litmus server, and agent is connected successful when first config.

Log

📶 Please enter LitmusChaos details --
👉 Host URL where litmus is installed: http://x.x.x.x:8080/
🤔 Username [admin]: admin
🙈 Password:
✅ Login Successful!
........
💡 Connecting agent to Litmus Portal.
💡 Connecting agent to Litmus Portal.
💡 Connecting agent to Litmus Portal.
🏃 Agents running!!

🚀 Agent Connection Successful!! 🎉
👉 Litmus agents can be accessed here: http://x.x.x.x:8080/targets
imrajdas commented 3 years ago

Okay @TidalPoo, There is configmap called agent-config. Inside that configmap, set SERVER_ADDR to http://x.x.x.x:8080/api/query and restart the subscriber pod.

ntlzthm8 commented 3 years ago

hi i seted SERVER_ADDR

apiVersion: v1
data:
  ACCESS_KEY: LtSuhB5FVvJcItyMNTTA3GPmGE2ufFrJ
  AGENT_SCOPE: cluster
  CLUSTER_ID: 8b13607e-0a64-4364-b6b4-f6996d33ce23
  COMPONENTS: |
    DEPLOYMENTS: ["app=chaos-exporter", "name=chaos-operator", "app=argo-server", "app=event-tracker", "app=workflow-controller"]
  IS_CLUSTER_CONFIRMED: "false"
  SERVER_ADDR: http://x.x.x.x:8080/query
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"ACCESS_KEY":"LtSuhB5FVvJcItyMNTTA3GPmGE2ufFrJ","AGENT_SCOPE":"cluster","CLUSTER_ID":"8b13607e-0a64-4364-b6b4-f6996d33ce23","COMPONENTS":"DEPLOYMENTS: [\"app=chaos-exporter\", \"name=chaos-operator\", \"app=argo-server\", \"app=event-tracker\", \"app=workflow-controller\"]\n","IS_CLUSTER_CONFIRMED":"false","SERVER_ADDR":"http://x.x.x.x:8080/query"},"kind":"ConfigMap","metadata":{"annotations":{},"creationTimestamp":"2021-06-02T14:05:41Z","name":"agent-config","namespace":"litmus","resourceVersion":"40796","uid":"0e37ae5b-c676-4e32-b2c6-3ddf0c1abf98"}}
  creationTimestamp: "2021-06-02T14:05:41Z"
  name: agent-config
  namespace: litmus
  resourceVersion: "41160"
  uid: 0e37ae5b-c676-4e32-b2c6-3ddf0c1abf98

and get error again

{"log":"2021/06/07 08:29:29 Go Version: go1.14.15\n","stream":"stderr","time":"2021-06-07T08:29:29.981938751Z"}
{"log":"2021/06/07 08:29:29 Go OS/Arch: linux/amd64\n","stream":"stderr","time":"2021-06-07T08:29:29.981987165Z"}
{"log":"time=\"2021-06-07T08:29:30Z\" level=info msg=\"all deployments up\"\n","stream":"stderr","time":"2021-06-07T08:29:30.395491217Z"}
{"log":"time=\"2021-06-07T08:29:30Z\" level=info msg=\"all components live...starting up subscriber\"\n","stream":"stderr","time":"2021-06-07T08:29:30.395531044Z"}
{"log":"2021/06/07 08:29:30 invalid character '\u003c' looking for beginning of value\n","stream":"stderr","time":"2021-06-07T08:29:30.596128349Z"}

config-map was checked, and have not invalid characters

imrajdas commented 3 years ago

Just checking, Have you replace http://x.x.x.x:8080/api/query with your IP like- http://:8080/api/query

example- http://10.2.3.31:8080/api/query

ntlzthm8 commented 3 years ago

yes)) of couse my ip is public ip, i can't show it here, security rules))) i'am not noob, to insert ip as x.x.x.x))))

edit: /api/ - word in url, it should be or not??

the original error does not contain /api/ {"log":"2021/06/02 14:09:33 Post \"http://192.168.49.2:31769/query\": dial tcp 192.168.49.2:31769: connect: connection refused\n","stream":"stderr","time":"2021-06-02T14:09:33.130219724Z"}

and i replace only ip

imrajdas commented 3 years ago

It's another way to connect with the control plane, so it should be http://minikubeip:<litmusportal-frontend-port>/api/query

ntlzthm8 commented 3 years ago

hey) i added /api/ to agent config map

and got another error

{"log":"2021/06/10 08:31:44 Go Version: go1.14.15\n","stream":"stderr","time":"2021-06-10T08:31:44.602624573Z"}
{"log":"2021/06/10 08:31:44 Go OS/Arch: linux/amd64\n","stream":"stderr","time":"2021-06-10T08:31:44.602651917Z"}
{"log":"time=\"2021-06-10T08:31:44Z\" level=info msg=\"all deployments up\"\n","stream":"stderr","time":"2021-06-10T08:31:44.624561982Z"}
{"log":"time=\"2021-06-10T08:31:44Z\" level=info msg=\"all components live...starting up subscriber\"\n","stream":"stderr","time":"2021-06-10T08:31:44.624595118Z"}
{"log":"2021/06/10 08:31:44 connecting to ws://XXX.XXX.XXX.XXX:8080/api/query\n","stream":"stderr","time":"2021-06-10T08:31:44.627956938Z"}
{"log":"2021/06/10 08:31:44 dial:websocket: bad handshake\n","stream":"stderr","time":"2021-06-10T08:31:44.631369421Z"}

my nginx config

server {
  listen 8080;

  location / {
   proxy_set_header Host            $host;
   proxy_set_header X-Forwarded-For $remote_addr;
   proxy_pass http://192.168.49.2:31391;
  }
# ws: for websockets to retrive logs in portal - it work OK
  location /ws/ {
    proxy_pass http://192.168.49.2:31391/ws/;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_set_header Host $host;
  }

  location /api/ {
    proxy_pass http://192.168.49.2:31391/api/;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_set_header Host $host;
  }
}

but /api/ is not working, what's wrong now?

selone commented 2 years ago

hey, L4 proxy will work

use stream

stream {
  # for subscriber
  server {
    listen 8080;
    proxy_connect_timeout 10s;
    proxy_timeout 10m;
    # address of center
    proxy_pass 192.168.49.2:31391;
  }
}