litmuschaos / litmus

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q
https://litmuschaos.io
Apache License 2.0
4.45k stars 698 forks source link

Mongo DB Driver issue SIGSEGV #4474

Closed sebay closed 9 months ago

sebay commented 9 months ago

What happened: Error with Mongo driver: [signal SIGSEGV: segmentation violation code=0x1 addr=0xd8 pc=0xaea122]

>kubectl logs litmusportal-server-7655f5dbc4-9dg79
{"file":"/gql-server/server.go:43","func":"main.init.0","level":"info","msg":"go version: go1.20.14","time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/server.go:44","func":"main.init.0","level":"info","msg":"go os/arch: linux/amd64","time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/pkg/database/mongodb/init.go:109","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/database/mongodb.MongoConnection","level":"info","msg":"connected to mongo","time":"2024-03-01T07:33:50Z"}
{"error":"(NamespaceExists) Collection litmus.chaosInfrastructures already exists.","file":"/gql-server/pkg/database/mongodb/init.go:130","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/database/mongodb.(*MongoClient).initAllCollection","level":"error","msg":"failed to create chaosInfrastructures collection","time":"2024-03-01T07:33:50Z"}
{"error":"(NamespaceExists) Collection litmus.chaosExperiments already exists.","file":"/gql-server/pkg/database/mongodb/init.go:154","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/database/mongodb.(*MongoClient).initAllCollection","level":"error","msg":"failed to create chaosExperiments collection","time":"2024-03-01T07:33:50Z"}
{"error":"(NamespaceExists) Collection litmus.chaosExperimentRuns already exists.","file":"/gql-server/pkg/database/mongodb/init.go:178","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/database/mongodb.(*MongoClient).initAllCollection","level":"error","msg":"failed to create chaosExperimentRuns collection","time":"2024-03-01T07:33:50Z"}
{"error":"(NamespaceExists) Collection litmus.chaosHubs already exists.","file":"/gql-server/pkg/database/mongodb/init.go:196","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/database/mongodb.(*MongoClient).initAllCollection","level":"error","msg":"failed to create chaosHubs collection","time":"2024-03-01T07:33:50Z"}
{"error":"(NamespaceExists) Collection litmus.chaosProbes already exists.","file":"/gql-server/pkg/database/mongodb/init.go:275","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/database/mongodb.(*MongoClient).initAllCollection","level":"error","msg":"failed to create chaosProbes collection","time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/server.go:143","func":"main.main","level":"info","msg":"chaos manager running at http://localhost:8080","time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/pkg/chaoshub/service.go:1021","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/chaoshub.(*chaosHubService).SyncDefaultChaosHubs","level":"info","msg":"syncing default chaos hub directories","time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/pkg/chaoshub/ops/gitops.go:172","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/chaoshub/ops.ChaosHubConfig.chaosChartSyncHandler","level":"info","msg":"executed isRepositoryExists()... ","repositoryExists":true,"time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/server.go:160","func":"main.startGRPCServer","level":"info","msg":"GRPC server listening on [::]:8000","time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/pkg/chaoshub/ops/gitops.go:160","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/chaoshub/ops.GitSyncDefaultHub","level":"error","msg":"error in executing Head: reference not found","time":"2024-03-01T07:33:50Z"}
{"error":"error in executing Head: reference not found","file":"/gql-server/pkg/chaoshub/service.go:1037","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/chaoshub.(*chaosHubService).SyncDefaultChaosHubs","hubName":"Litmus ChaosHub","level":"error","msg":"failed to sync default chaos hubs","repoBranch":"v3.3.x","repoUrl":"https://github.com/litmuschaos/chaos-charts","time":"2024-03-01T07:33:50Z"}
{"file":"/gql-server/pkg/projects/project_handler.go:60","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/projects.ProjectEvents","level":"error","msg":"(Location40573) The $changeStream stage is only supported on replica sets","time":"2024-03-01T07:33:50Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0xd8 pc=0xaea122]

goroutine 99 [running]:
go.mongodb.org/mongo-driver/mongo.(*ChangeStream).next(0xc000452cb0?, {0x21ee200?, 0xc00012a000?}, 0x1?)
        /go/pkg/mod/go.mongodb.org/mongo-driver@v1.11.4/mongo/change_stream.go:603 +0x22
go.mongodb.org/mongo-driver/mongo.(*ChangeStream).Next(...)
        /go/pkg/mod/go.mongodb.org/mongo-driver@v1.11.4/mongo/change_stream.go:583
github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/projects.ProjectEvents(0xc000426050?, 0x21ee200?, {0x21f7760, 0xc000136040})
        /gql-server/pkg/projects/project_handler.go:66 +0x258
created by main.main
        /gql-server/server.go:141 +0xafe

What you expected to happen:

How to reproduce it (as minimally and precisely as possible): Using litmus 3.4.0 and Mongo from either litmuschaos/mongo:4.2.8 or litmuschaos/mongo:6 (chart below taken from 2.14)

Full setup below (docker registry/password edited):

apiVersion: v1
kind: ServiceAccount
metadata:
  name: litmus-server-account
---
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: litmus-portal-admin-secret
data:
  litmus_admin_password: xx
  DB_PASSWORD: xx
  JWT_SECRET: xx
  DB_USER: xx
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: litmus-portal-admin-config
data:
  DB_SERVER: "mongodb://mongo-service:27017"
  VERSION: "3.4.0"
  SKIP_SSL_VERIFY: "false"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: litmusportal-frontend-nginx-configuration
  namespace: application
data:
  nginx.conf: |
    pid /tmp/nginx.pid;

    events {
      worker_connections  1024;
    }

    http {
        map $http_upgrade $connection_upgrade {
            default upgrade;
            '' close;
        }

        client_body_temp_path /tmp/client_temp;
        proxy_temp_path       /tmp/proxy_temp_path;
        fastcgi_temp_path     /tmp/fastcgi_temp;
        uwsgi_temp_path       /tmp/uwsgi_temp;
        scgi_temp_path        /tmp/scgi_temp;

        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
        keepalive_timeout 65;
        types_hash_max_size 2048;
        server_tokens off;

        include /etc/nginx/mime.types;

        gzip on;
        gzip_disable "msie6";

        access_log /var/log/nginx/access.log;
        error_log /var/log/nginx/error.log;

        server {
            listen 8185 default_server;
            root   /opt/chaos;

            location /health {
              return 200;
            }

            location / {
                proxy_http_version 1.1;
                add_header Cache-Control "no-cache";
                try_files $uri /index.html;
                autoindex on;
            }

            # redirect server error pages to the static page /50x.html
            #
            error_page   500 502 503 504  /50x.html;
            location = /50x.html {
                root   /usr/share/nginx/html;
            }

            location /auth/ {
                proxy_http_version 1.1;
                proxy_set_header   Host                 $host;
                proxy_set_header   X-Real-IP            $remote_addr;
                proxy_set_header   X-Forwarded-For      $proxy_add_x_forwarded_for;
                proxy_set_header   X-Forwarded-Proto    $scheme;
                proxy_pass "http://litmusportal-auth-server-service:9003/";
            }

            location /api/ {
                proxy_http_version 1.1;
                proxy_set_header   Host                 $host;
                proxy_set_header   X-Real-IP            $remote_addr;
                proxy_set_header   X-Forwarded-For      $proxy_add_x_forwarded_for;
                proxy_set_header   X-Forwarded-Proto    $scheme;
                proxy_pass "http://litmusportal-server-service:9002/";
            }

            location /ws/ {
                proxy_http_version 1.1;
                proxy_set_header   Upgrade              $http_upgrade;
                proxy_set_header   Connection           $connection_upgrade;
                proxy_set_header   Host                 $host;
                proxy_set_header   X-Real-IP            $remote_addr;
                proxy_set_header   X-Forwarded-For      $proxy_add_x_forwarded_for;
                proxy_set_header   X-Forwarded-Proto    $scheme;
                proxy_pass "http://litmusportal-server-service:9002/";
            }
        }
    }
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litmusportal-frontend
  labels:
    component: litmusportal-frontend
spec:
  replicas: 1
  selector:
    matchLabels:
      component: litmusportal-frontend
  template:
    metadata:
      labels:
        component: litmusportal-frontend
    spec:
      automountServiceAccountToken: false
      containers:
        - name: litmusportal-frontend
          image: registryhere/litmuschaos/litmusportal-frontend:3.4.0
          # securityContext:
          #   runAsUser: 2000
          #   allowPrivilegeEscalation: false
          #   runAsNonRoot: true
          imagePullPolicy: Always
          ports:
            - containerPort: 8185
          resources:
            requests:
              memory: "250Mi"
              cpu: "125m"
              ephemeral-storage: "500Mi"
            limits:
              memory: "512Mi"
              cpu: "550m"
              ephemeral-storage: "1Gi"
          volumeMounts:
            - name: nginx-config
              mountPath: /etc/nginx/nginx.conf
              subPath: nginx.conf
      volumes:
        - name: nginx-config
          configMap:
            name: litmusportal-frontend-nginx-configuration
---
apiVersion: v1
kind: Service
metadata:
  name: litmusportal-frontend-service
  namespace: application
  annotations:
    external-dns.alpha.kubernetes.io/hostname: myfqdn.net
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 8080
      targetPort: 8185
  selector:
    component: litmusportal-frontend
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litmusportal-server
  labels:
    component: litmusportal-server
spec:
  replicas: 1
  selector:
    matchLabels:
      component: litmusportal-server
  template:
    metadata:
      labels:
        component: litmusportal-server
    spec:
      volumes:
        - name: gitops-storage
          emptyDir: {}
        - name: hub-storage
          emptyDir: {}
      containers:
        - name: graphql-server
          image: registryhere/litmuschaos/litmusportal-server:3.4.0
          volumeMounts:
            - mountPath: /tmp/
              name: gitops-storage
            - mountPath: /tmp/version
              name: hub-storage
          securityContext:
            runAsUser: 2000
            allowPrivilegeEscalation: false
            runAsNonRoot: true
            readOnlyRootFilesystem: true
          envFrom:
            - configMapRef:
                name: litmus-portal-admin-config
            - secretRef:
                name: litmus-portal-admin-secret
          env:
            - name: SELF_AGENT
              value: "true"
            # if self-signed certificate are used pass the k8s tls secret name created in portal ns, to allow agents to use tls for communication
            - name: TLS_SECRET_NAME
              value: ""
            - name: AGENT_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: LITMUS_PORTAL_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: INFRA_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            # if self-signed certificate are used pass the base64 tls certificate, to allow agents to use tls for communication
            - name: TLS_CERT_B64
              value: ""
            - name: CHAOS_CENTER_SCOPE
              value: "cluster"
            - name: INFRA_DEPLOYMENTS
              value: '["app=chaos-exporter", "name=chaos-operator", "app=workflow-controller", "app=event-tracker"]'
            - name: SERVER_SERVICE_NAME
              value: "litmusportal-server-service"
            - name: CHAOS_CENTER_UI_ENDPOINT
              value: "http://litmusportal-frontend-service:8080"
            - name: SUBSCRIBER_IMAGE
              value: "registryhere/litmuschaos/litmusportal-subscriber:3.4.0"
            - name: EVENT_TRACKER_IMAGE
              value: "registryhere/litmuschaos/litmusportal-event-tracker:3.4.0"
            - name: ARGO_WORKFLOW_CONTROLLER_IMAGE
              value: "registryhere/litmuschaos/workflow-controller:v3.3.1"
            - name: ARGO_WORKFLOW_EXECUTOR_IMAGE
              value: "registryhere/litmuschaos/argoexec:v3.3.1"
            - name: LITMUS_CHAOS_OPERATOR_IMAGE
              value: "registryhere/litmuschaos/chaos-operator:3.4.0"
            - name: LITMUS_CHAOS_RUNNER_IMAGE
              value: "registryhere/litmuschaos/chaos-runner:3.4.0"
            - name: LITMUS_CHAOS_EXPORTER_IMAGE
              value: "registryhere/litmuschaos/chaos-exporter:3.4.0"
            - name: CONTAINER_RUNTIME_EXECUTOR
              value: "k8sapi"
            - name: DEFAULT_HUB_BRANCH_NAME
              value: "v3.3.x"
            - name: LITMUS_AUTH_GRPC_ENDPOINT
              value: "litmusportal-auth-server-service"
            - name: LITMUS_AUTH_GRPC_PORT
              value: ":3030"
            - name: WORKFLOW_HELPER_IMAGE_VERSION
              value: "3.3.0"
            - name: REMOTE_HUB_MAX_SIZE
              value: "5000000"
            - name: INGRESS
              value: "false"
            - name: INGRESS_NAME
              value: "litmus-ingress"
            - name: INFRA_COMPATIBLE_VERSIONS
              value: '["3.4.0"]'
          ports:
            - containerPort: 8080
            - containerPort: 8000
          imagePullPolicy: Always
          resources:
            requests:
              memory: "250Mi"
              cpu: "225m"
              ephemeral-storage: "500Mi"
            limits:
              memory: "712Mi"
              cpu: "550m"
              ephemeral-storage: "1Gi"
      serviceAccountName: litmus-server-account
---
apiVersion: v1
kind: Service
metadata:
  name: litmusportal-server-service
spec:
  type: NodePort
  ports:
    - name: graphql-server
      port: 9002
      targetPort: 8080
    - name: graphql-rpc-server
      port: 8000
      targetPort: 8000
  selector:
    component: litmusportal-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: litmusportal-auth-server
  namespace: application
  labels:
    component: litmusportal-auth-server
spec:
  replicas: 1
  selector:
    matchLabels:
      component: litmusportal-auth-server
  template:
    metadata:
      labels:
        component: litmusportal-auth-server
    spec:
      automountServiceAccountToken: false
      initContainers:
        - name: wait-for-mongodb
          image: registryhere/litmuschaos/curl:2.14.0
          command: ["/bin/sh", "-c"]
          args:
            [
              "while [[ $(curl -sw '%{http_code}' http://mongo-service:27017 -o /dev/null) -ne 200 ]]; do sleep 5; echo 'Waiting for the MongoDB to be ready...'; done; echo 'Connection with MongoDB established'",
            ]
          resources:
            requests:
              memory: "150Mi"
              cpu: "25m"
              ephemeral-storage: "500Mi"
            limits:
              memory: "225Mi"
              cpu: "250m"
              ephemeral-storage: "1Gi"
      containers:
        - name: auth-server
          image: registryhere/litmuschaos/litmusportal-auth-server:3.4.0
          securityContext:
            runAsUser: 2000
            allowPrivilegeEscalation: false
            runAsNonRoot: true
            readOnlyRootFilesystem: true
          envFrom:
            - configMapRef:
                name: litmus-portal-admin-config
            - secretRef:
                name: litmus-portal-admin-secret
          env:
            - name: STRICT_PASSWORD_POLICY
              value: "false"
            - name: ADMIN_USERNAME
              value: "admin"
            #- name: ADMIN_PASSWORD
            #  value: "litmus"
            - name: ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: litmus-portal-admin-secret
                  key: litmus_admin_password
            - name: LITMUS_GQL_GRPC_ENDPOINT
              value: "litmusportal-server-service"
            - name: LITMUS_GQL_GRPC_PORT
              value: ":8000"
          ports:
            - containerPort: 3000
            - containerPort: 3030
          imagePullPolicy: Always
          resources:
            requests:
              memory: "250Mi"
              cpu: "225m"
              ephemeral-storage: "500Mi"
            limits:
              memory: "712Mi"
              cpu: "550m"
              ephemeral-storage: "1Gi"
---
apiVersion: v1
kind: Service
metadata:
  name: litmusportal-auth-server-service
spec:
  type: NodePort
  ports:
    - name: auth-server
      port: 9003
      targetPort: 3000
    - name: auth-rpc-server
      port: 3030
      targetPort: 3030
  selector:
    component: litmusportal-auth-server
---
###
# Source: litmus-chaos/templates/litmus-2.14.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
  namespace: application
  labels:
    app: mongo
spec:
  selector:
    matchLabels:
      component: database
  serviceName: mongo-headless-service
  replicas: 1
  template:
    metadata:
      labels:
        component: database
    spec:
      automountServiceAccountToken: false
      containers:
        - name: mongo
          image: registryhere/litmuschaos/mongo:6
          securityContext:
            #            runAsUser: 2000
            allowPrivilegeEscalation: false
          #            runAsNonRoot: true
          args: ["--ipv6"]
          ports:
            - containerPort: 27017
          imagePullPolicy: Always
          volumeMounts:
            - name: mongo-persistent-storage
              mountPath: /data/db
          resources:
            requests:
              memory: "550Mi"
              cpu: "225m"
              ephemeral-storage: "1Gi"
            limits:
              memory: "1Gi"
              cpu: "750m"
              ephemeral-storage: "3Gi"
          env:
            - name: MONGO_INITDB_ROOT_USERNAME
              valueFrom:
                secretKeyRef:
                  name: litmus-portal-admin-secret
                  key: DB_USER
            - name: MONGO_INITDB_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: litmus-portal-admin-secret
                  key: DB_PASSWORD
  volumeClaimTemplates:
    - metadata:
        name: mongo-persistent-storage
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 20Gi
        storageClassName: azure-disk-cmk
---
# Source: litmus-chaos/templates/litmus-2.14.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: mongo
  name: mongo-service
  namespace: application
spec:
  ports:
    - port: 27017
      targetPort: 27017
  selector:
    component: database
---
# Source: litmus-chaos/templates/litmus-2.14.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: mongo
  name: mongo-headless-service
  namespace: application
spec:
  clusterIP: None
  ports:
    - port: 27017
      targetPort: 27017
  selector:
    component: database

Anything else we need to know?:

SarthakJain26 commented 9 months ago

@sebay it is because with litmus 3.0.0 we have a dependency on replica sets of mongo. You will have to update your standalone mongo instance to replica set. cc: @Saranya-jena

sebay commented 9 months ago

@SarthakJain26 noted I have installed with bitmani now. That makes the install more complicated than it used to be though.

sebay commented 9 months ago

Use mongodb with replicaset

dariolstella commented 7 months ago

Following the error --> {"error":"error in executing Head: reference not found","file":"/gql-server/pkg/chaoshub/service.go:1037","func":"github.com/litmuschaos/litmus/chaoscenter/graphql/server/pkg/chaoshub.(*chaosHubService).SyncDefaultChaosHubs","hubName":"Litmus ChaosHub","level":"error","msg":"failed to sync default chaos hubs","repoBranch":"v3.3.x","repoUrl":"https://github.com/litmuschaos/chaos-charts","time":"2024-03-01T07:33:50Z"} I had a similar issue with litmus 3.5.0. and the problem was that graphql container was unable to connect with chaos hub. I configured https_proxy as env variable and litmus-server start up without issues.