jenkinsci / datadog-plugin

A Jenkins plugin used to forward metrics, events, and service checks to an account at Datadog, automatically.
https://plugins.jenkins.io/datadog/
MIT License
33 stars 48 forks source link

Java errors after enabling log collection in Jenkins #295

Open ricardojdsilva87 opened 2 years ago

ricardojdsilva87 commented 2 years ago

Hello we are having the following errors on Jenkins after enabling the log collection in Datadog plugin:

2022-07-08 16:30:58.755+0000 [id=729933]    INFO    o.d.j.p.d.c.DatadogAgentClient#reinitializeLogger: Re/Initialize Datadog-Plugin Logger: hostname = 10.241.65.43, logCollectionPort = 8126
2022-07-08 16:30:58.760+0000 [id=732176]    SEVERE    o.d.j.p.datadog.DatadogUtilities#severe: java.util.logging.ErrorManager: 2

Tried to do a telnet on localhost 8126 and it works.

To Reproduce Steps to reproduce the behavior:

  1. Configure the plugin using configuration as code (casc)
  2. Configure the settings in a pipeline job to enable extra settings
  3. Datadog agent is locally running as a container in the node with log collection enabled

Expected behavior The agent should ship the logs to the central console

Configuration: The casc configuration is the following:

unclassified:
  datadogGlobalConfiguration:
    ciInstanceName: "jenkins-prod"
    collectBuildLogs: true
    emitConfigChangeEvents: false
    emitSecurityEvents: true
    emitSystemEvents: true
    enableCiVisibility: true
    reportWith: "DSD"
    retryLogs: true
    targetLogCollectionPort: 8126
    targetPort: 8125
    targetTraceCollectionPort: 8126

The plugin is working correctly sending the information about the jobs into datadog (we are using the already existent Datadog dashboard to check) example:

image

On the Jenkins pipeline we are setting the following:

pipeline {
    agent {
        kubernetes {}
    }
    options {
        // https://docs.datadoghq.com/integrations/jenkins/
        datadog(collectLogs: true, tags: ["tenant:test", "service:jenkins"])
    }

Also testing the connections on the UI seem to work correctly:

image

We are not seeing any kind of pipeline logs information being shipped to datadog. Is there any kind of debugging we can do? Are some of the settings incorrect?

Also a question regarding the configuration: If we set the collectLogs variable:true in the pipeline do we need to specify the option to collect logs globally? In this case if we disable the collect logs option globally and set via the pipeline we still aren't receiving the logs, but the error logs on Jenkins mentioned above stops

Environment and Versions:

Thanks for your help

jose-manuel-almaza commented 2 years ago

Hi @ricardojdsilva87 The error code we get is: The error code indicating a failure when flushing an output stream Investigating possible causes.

jose-manuel-almaza commented 2 years ago

Hi @ricardojdsilva87 You said: "Tried to do a telnet on localhost 8126 and it works" Can you tell me what IP are you trying to connect to via telnet? localhost is 127.0.0.1 and as I can see in the log message that you indicate the IP configured in the plugin is 10.241.65.43

ricardojdsilva87 commented 2 years ago

Hello @jose-manuel-almaza , Result is the same: image The test was to demonstrate that the datadog agent (running in Kubernetes deployed with an helm chart) is listening on that port. As I understand the Jenkins plugin connects somehow to the local datadog agent so that the information (logs/metrics) can be exported? As showed we are receiving information about the status of the builds already. Thanks

jose-manuel-almaza commented 2 years ago

Hi @ricardojdsilva87 I see in your config targetLogCollectionPort and targetTraceCollectionPort have the same value. Can you assign different values (not assigned ports, of course) and show me logs?

ricardojdsilva87 commented 2 years ago

Hello @jose-manuel-almaza , I've tried to change on the helm chart the port to 8127 to check if I could use different port for trace and another for log collection, but it seems that the trace and log collection ports are the same. After changing the trace port, the log port failed to connect successfully and only changing to 8127 it started working again. If I enable the log collection option on Jenkins, I get the same errors mentioned previously.

I did thought the test that you mentioned and assigned a port that is not configured on datadog (port 8126 configured on datadog and port 8127 configured on Jenkins), with that I get the following errors as it would be expected:

2022-07-15 09:28:38.653+0000 [id=2879675]    INFO    o.d.j.p.d.c.DatadogAgentClient#reinitializeLogger: Re/Initialize Datadog-Plugin Logger: hostname = 10.241.65.43, logCollectionPort = 8127
2022-07-15 09:28:38.654+0000 [id=2879675]    SEVERE    o.d.j.p.datadog.DatadogUtilities#severe: Failed to reinitialize Datadog-Plugin Logger, Connection exception. This may be because your port is incorrect 8127
2022-07-15 09:28:38.654+0000 [id=2879675]    INFO    o.d.j.p.d.c.DatadogAgentClient#sendLogs: Datadog Plugin Logger could not be initialized
2022-07-15 09:28:38.654+0000 [id=2879675]    INFO    o.d.j.p.d.c.DatadogAgentClient#sendLogs: Datadog Plugin Logger does not have handlers

Thanks

sarah-witt commented 2 years ago

Hi @ricardojdsilva87 to help debug further, can you temporarily add a log recorder for the Datadog plugin? You can set the logger name to org.datadog.jenkins.plugins.datadog and the level to ALL. This may provide more detailed information around the log failure.

Additionally, can you try setting retryLogs in your configuration to false and see if the errors persist?

ricardojdsilva87 commented 2 years ago

Hello @sarah-witt , I've reenabled the previous settings and disabled the retryLogs option. The log recorder shows me the following:

Jul 18, 2022 10:25:01 AM INFO org.datadog.jenkins.plugins.datadog.clients.DatadogAgentClient reinitializeLogger
Re/Initialize Datadog-Plugin Logger: hostname = 10.241.65.43, logCollectionPort = 8126
Jul 18, 2022 10:25:01 AM SEVERE org.datadog.jenkins.plugins.datadog.DatadogUtilities severe
java.util.logging.ErrorManager: 2
Jul 18, 2022 10:25:01 AM FINER org.datadog.jenkins.plugins.datadog.clients.DatadogErrorManager
java.util.logging.ErrorManager: 2: java.net.SocketException: Broken pipe (Write failed)
    at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
    at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
    at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
    at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
    at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
    at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:318)
    at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153)
    at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251)
    at java.logging/java.util.logging.StreamHandler.flush(StreamHandler.java:247)
    at java.logging/java.util.logging.SocketHandler.publish(SocketHandler.java:183)
    at java.logging/java.util.logging.Logger.log(Logger.java:979)
    at java.logging/java.util.logging.Logger.doLog(Logger.java:1006)
    at java.logging/java.util.logging.Logger.log(Logger.java:1029)
    at java.logging/java.util.logging.Logger.info(Logger.java:1802)
    at org.datadog.jenkins.plugins.datadog.clients.DatadogAgentClient.sendLogs(DatadogAgentClient.java:534)
    at org.datadog.jenkins.plugins.datadog.logs.DatadogWriter.write(DatadogWriter.java:76)
    at org.datadog.jenkins.plugins.datadog.logs.DatadogOutputStream.eol(DatadogOutputStream.java:52)
    at hudson.console.LineTransformationOutputStream.eol(LineTransformationOutputStream.java:61)
    at hudson.console.LineTransformationOutputStream.write(LineTransformationOutputStream.java:57)
    at java.base/java.io.PrintStream.write(PrintStream.java:528)
    at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:87)
    at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$NewlineSafeTaskListener$1.write(DurableTaskStep.java:459)
    at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:137)
    at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$NewlineSafeTaskListener$1.write(DurableTaskStep.java:463)
    at java.base/java.io.PrintStream.write(PrintStream.java:559)
    at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
    at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:74)
    at hudson.remoting.ProxyOutputStream$Chunk.lambda$execute$0(ProxyOutputStream.java:260)
    at hudson.remoting.PipeWriter$1.run(PipeWriter.java:159)
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
    at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
    at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:18)
    at hudson.remoting.CallableDecoratorList.lambda$applyDecorator$0(CallableDecoratorList.java:19)
    at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
    at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)

Thanks

sarah-witt commented 2 years ago

@ricardojdsilva87 Thanks for the information. Is this log line from when retryLogs is disabled or enabled? Did changing the setting make any difference?

ricardojdsilva87 commented 2 years ago

Hello @sarah-witt , the retryLogs option is disabled but it seems to me that the log recorder on Jenkins gives thatJava stack trace information, the logs I sent on the first thread were directly from he container logs. Answering your question the behaviour is the same, is there any other option that I can try out? Thanks

sarah-witt commented 2 years ago

@ricardojdsilva87 Can you confirm that you have a custom log source enabled for jenkins? And if so, what port is configured? https://github.com/jenkinsci/datadog-plugin#log-collection-for-agents

ricardojdsilva87 commented 2 years ago

Hello @sarah-witt , sorry for the late reply. We are using the datadog agent installed as an helm chart, the only option to enable the logs is this one and we have it active like that. Also I've looked in the datadog documentation if any port was mentioned, couldn't find anything but I think that it's the same where the agent runs and connects (8126). Going trought this documentation, checking the agent status everything seems working fine (below also an example of the log collection).

==========
Logs Agent
==========

    Reliable: Sending compressed logs in HTTPS to agent-http-intake.logs.datadoghq.eu on port 443
    BytesSent: 3.994313914e+09
    EncodedBytesSent: 2.0602406e+08
    LogsProcessed: 2.960852e+06
    LogsSent: 2.960833e+06

  kube-system/efs-csi-node-t4ctd/csi-driver-registrar
  ---------------------------------------------------
    - Type: file
      Identifier: 0bd0884326c6fb370f736970243aca96f831136fac7a93763b2bc1a914c015fa
      Path: /var/log/pods/kube-system_efs-csi-node-t4ctd_48c66625-14ab-4db0-ba5e-ad17da54efb7/csi-driver-registrar/*.log
      Status: OK
        1 files tailed out of 1 files matching
      Inputs:
        /var/log/pods/kube-system_efs-csi-node-t4ctd_48c66625-14ab-4db0-ba5e-ad17da54efb7/csi-driver-registrar/0.log
      BytesRead: 0
      Average Latency (ms): 0
      24h Average Latency (ms): 0
      Peak Latency (ms): 0
      24h Peak Latency (ms): 0

Also trying the connection to the outbound site where the logs are sent to datadog seems to be successfull image

By default the datadog agent collects all the logs from the running containers on the node. We needed to add only a custom configuration for the jenkins logs to parse the java stack traces so it won't appear as separate lines on the datadog portal.

  podAnnotations:
    ad.datadoghq.com/jenkins.logs:
          '[{
            "source": "jenkins",
            "service": "jenkins",
            "log_processing_rules": [{
              "type": "multi_line",
              "name": "log_start_with_date",
              "pattern" : "\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d{3}\\+[0-9]{4}"
            }]
          }]'

Thanks

sarah-witt commented 2 years ago

Hi @ricardojdsilva87, sorry for the delay in response. In your agent status output you should have a log source for jenkins, if not then you will have to explicitly create the file according to these docs so the agent can listen for the TCP traffic from the plugin: https://github.com/jenkinsci/datadog-plugin#log-collection-for-agents.

If the problem continues, do you mind creating a support ticket so that we can get more information to troubleshoot? Thanks!

ricardojdsilva87 commented 2 years ago

Thanks for the reply @sarah-witt , I'm not seeing what might be wrong. By default the datadog agent installed using the helm chart only catches the container logs that you can see using kubernetes. In our case the jobs are run in separate pods and they are not outputing the job result to the console so the datadog agent cannot export it. We'll need to configure in the agent the different locations to fetch the log file from where they are kept.

I'll post here the result of the new changes Thanks for your help.