jenkins-x-plugins / jx-build-controller

a controller which watches PipelineRuns and updates PipelineActivity resources and stores container logs in bucket storage
2 stars 15 forks source link

Logs not uploaded #55

Open chrislovecnm opened 2 years ago

chrislovecnm commented 2 years ago

Problem

I have recently installed jx3 in AWS EKS using and existing cluster and the TF provider in your instructions. When a build runs the logs are not accessible or stored in s3. I go into dashboard and attempt to access the raw logs and get this error:

failed to read key jenkins-x/logs/REDACTED/jenkinsx-node-quickstart/PR-3/2.log in bucket s3://logs-REDACTED-20220420182851392800000001: blob (key "jenkins-x/logs/sREDACTED/jenkinsx-node-quickstart/PR-3/2.log") (code=NotFound): NoSuchKey: The specified key does not exist.
    status code: 404, request id: REDACTED, host id: REDACTED/TgW8ggfHHH0/vzJFuXwh8fgk=

The build is successful but no logs are stored.

Debugging Done

Here are some steps I have used to debug to rule out permission issues.

  1. All three buckets exist
  2. The RBAC role that builder-controller is using is pointing to the correct IAM role
  3. The service account that the builder-controller is using is bound to the correct RBAC role and permissions
  4. The AWS IAM role give builder-controller full admin perms on all of s3

Versions

jx-reqirements

apiVersion: core.jenkins-x.io/v4beta1
kind: Requirements
spec:
  autoUpdate:
    enabled: false
    schedule: ""
  cluster:
    chartRepository: http://jenkins-x-chartmuseum.jx.svc.cluster.local:8080
    clusterName: REDACTED
    devEnvApprovers:
    - todo
    environmentGitOwner: REDACTED
    gitKind: gitlab
    gitName: gl
    gitServer: https://gitlab.com
    project: "REDACTED"
    provider: eks
    region: us-west-2
    registry: REDACTED.dkr.ecr.us-east-2.amazonaws.com
  environments:
  - key: dev
    owner: REDACTED
    repository: REDACTED
  - key: staging
  - key: production
  ingress:
    domain: REDACTED.REDACTED.com
    externalDNS: true
    kind: ingress
    namespaceSubDomain: -jx.
    tls:
      email: REDACTED
      enabled: true
      production: true
  pipelineUser:
    username: REDACTED
  repository: nexus
  secretStorage: secretsManager
  storage:
  - name: logs
    url: s3://logs-REDACTED-dev-eks-20220420182851392800000001
  - name: reports
    url: s3://reports-REDACTED-dev-eks-20220420182851770800000006
  - name: repository
    url: s3://repository-REDACTED-dev-eks-20220420182851398900000002
  terraform: true
  vault: {}
  webhook: lighthouse

Questions

  1. How do I debug more?
  2. What process is actually writing the logs? Is there another pod that is missing the IAM token? I know we have moved from ClusterRoles to Roles recently

TLDR;

The s3 buckets exist, the build runs, no logs are saved, and we cannot see any logs in the dashboard.

chrislovecnm commented 2 years ago

I getting logging messages in the build pod that the logs should be persisted. But not getting any logs saved.

chrislovecnm commented 2 years ago

After I upgraded I am getting log messages uploaded, but they are old messages.

{"level":"info","msg":"wrote file jenkins-x/logs/REDACTED/jx3-go-poc-take-2/PR-1/2.log to bucket s3://logs-REDACTED-jx3-dev-eks-20220420182851392800000001","time":"2022-04-28T13:54:38Z"}
{"level":"info","msg":"wrote file jenkins-x/logs/REDACTED/jx3-go-poc-take-2/PR-1/2.yaml to bucket s3://logs-REDACTED-jx3-dev-eks-20220420182851392800000001","time":"2022-04-28T13:54:38Z"}
{"level":"info","msg":"wrote file jenkins-x/pipelineruns/jx/REDACTED-jx3-go-poc-take-2-pr-1-pr-8lwgp.yaml to bucket s3://logs-REDACTED-jx3-dev-eks-20220420182851392800000001","time":"2022-04-28T13:54:39Z"}
{"level":"info","msg":"updated PipelineActivity REDACTED-rnd-jx3-go-poc-take-2-pr-1-2 with new build logs URL s3://logs-REDACTED-jx3-dev-eks-20220420182851392800000001/jenkins-x/logs/REDACTED/jx3-go-poc-take-2/PR-1/2.log","time":"2022-04-28T13:54:39Z"}
chrislovecnm commented 2 years ago

So here is an interesting update:

I restarted the build-controller just to bounce the system and all of the older logs where sent to s3. But the latest PR that was built, logs where not stored.

If I went to

https://dashboard-jx.REDACTED.com/ns-jx/REDACTED/jx3-go-poc-take-2/PR-3/3

I have logs after restarting the build controller.

But if I got to:

https://dashboard-jx.REDACTED.com/ns-jx/REDACTED/jx3-go-poc-take-2/PR-3/4

Which is the current build I do not have logs, and the logs have not been uploaded by the build-controller.

So there is a bug where the latest log is not uploaded for some reason. I also checked that the logs do exist for the most current build.

$ k logs REDACTED-jx3-go-poc-take-2-pr-3-pr-9wkjb-from-build-pack-v8-xq54p step-debugging-logging
Debugging logging

I created a step to just echo something to the logs, to ensure we are getting logs.

Does jx have a command line option to tail the logs?

babadofar commented 2 years ago

I can confirm this issue. Using build controller version 0.3.19 The s3 buckets are created, but empty. After killing the build controller, the s3 buckets are populated with the logs.

ankitm123 commented 2 years ago

Does jx have a command line option to tail the logs?

jx pipeline logs should do the trick.

Are you both using mac locally by any chance?

chrislovecnm commented 2 years ago

No I am on linux. For the admin tools and also running on linux for the k8s cluster. EKS

msvticket commented 2 years ago

Now it hangs regularly for me. The last log message before hanging is always about "created PipelineActivity". When I delete the lighhousejob referenced in that PipelineActivity jx-build-controller continues for a while until the next problematic one is encountered.

I guess the root cause is something else, and that when catching up jx-build-controller can't handle already finished pipelineruns.