Open ricardojdsilva87 opened 2 years ago
Hi @ricardojdsilva87
The error code we get is: The error code indicating a failure when flushing an output stream
Investigating possible causes.
Hi @ricardojdsilva87
You said: "Tried to do a telnet on localhost 8126 and it works"
Can you tell me what IP are you trying to connect to via telnet?
localhost
is 127.0.0.1
and as I can see in the log message that you indicate the IP configured in the plugin is 10.241.65.43
Hello @jose-manuel-almaza , Result is the same: The test was to demonstrate that the datadog agent (running in Kubernetes deployed with an helm chart) is listening on that port. As I understand the Jenkins plugin connects somehow to the local datadog agent so that the information (logs/metrics) can be exported? As showed we are receiving information about the status of the builds already. Thanks
Hi @ricardojdsilva87
I see in your config targetLogCollectionPort
and targetTraceCollectionPort
have the same value.
Can you assign different values (not assigned ports, of course) and show me logs?
Hello @jose-manuel-almaza , I've tried to change on the helm chart the port to 8127 to check if I could use different port for trace and another for log collection, but it seems that the trace and log collection ports are the same. After changing the trace port, the log port failed to connect successfully and only changing to 8127 it started working again. If I enable the log collection option on Jenkins, I get the same errors mentioned previously.
I did thought the test that you mentioned and assigned a port that is not configured on datadog (port 8126 configured on datadog and port 8127 configured on Jenkins), with that I get the following errors as it would be expected:
2022-07-15 09:28:38.653+0000 [id=2879675] INFO o.d.j.p.d.c.DatadogAgentClient#reinitializeLogger: Re/Initialize Datadog-Plugin Logger: hostname = 10.241.65.43, logCollectionPort = 8127
2022-07-15 09:28:38.654+0000 [id=2879675] SEVERE o.d.j.p.datadog.DatadogUtilities#severe: Failed to reinitialize Datadog-Plugin Logger, Connection exception. This may be because your port is incorrect 8127
2022-07-15 09:28:38.654+0000 [id=2879675] INFO o.d.j.p.d.c.DatadogAgentClient#sendLogs: Datadog Plugin Logger could not be initialized
2022-07-15 09:28:38.654+0000 [id=2879675] INFO o.d.j.p.d.c.DatadogAgentClient#sendLogs: Datadog Plugin Logger does not have handlers
Thanks
Hi @ricardojdsilva87 to help debug further, can you temporarily add a log recorder for the Datadog plugin? You can set the logger name to org.datadog.jenkins.plugins.datadog
and the level to ALL
. This may provide more detailed information around the log failure.
Additionally, can you try setting retryLogs
in your configuration to false
and see if the errors persist?
Hello @sarah-witt , I've reenabled the previous settings and disabled the retryLogs option. The log recorder shows me the following:
Jul 18, 2022 10:25:01 AM INFO org.datadog.jenkins.plugins.datadog.clients.DatadogAgentClient reinitializeLogger
Re/Initialize Datadog-Plugin Logger: hostname = 10.241.65.43, logCollectionPort = 8126
Jul 18, 2022 10:25:01 AM SEVERE org.datadog.jenkins.plugins.datadog.DatadogUtilities severe
java.util.logging.ErrorManager: 2
Jul 18, 2022 10:25:01 AM FINER org.datadog.jenkins.plugins.datadog.clients.DatadogErrorManager
java.util.logging.ErrorManager: 2: java.net.SocketException: Broken pipe (Write failed)
at java.base/java.net.SocketOutputStream.socketWrite0(Native Method)
at java.base/java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
at java.base/java.net.SocketOutputStream.write(SocketOutputStream.java:150)
at java.base/java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:81)
at java.base/java.io.BufferedOutputStream.flush(BufferedOutputStream.java:142)
at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:318)
at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:153)
at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251)
at java.logging/java.util.logging.StreamHandler.flush(StreamHandler.java:247)
at java.logging/java.util.logging.SocketHandler.publish(SocketHandler.java:183)
at java.logging/java.util.logging.Logger.log(Logger.java:979)
at java.logging/java.util.logging.Logger.doLog(Logger.java:1006)
at java.logging/java.util.logging.Logger.log(Logger.java:1029)
at java.logging/java.util.logging.Logger.info(Logger.java:1802)
at org.datadog.jenkins.plugins.datadog.clients.DatadogAgentClient.sendLogs(DatadogAgentClient.java:534)
at org.datadog.jenkins.plugins.datadog.logs.DatadogWriter.write(DatadogWriter.java:76)
at org.datadog.jenkins.plugins.datadog.logs.DatadogOutputStream.eol(DatadogOutputStream.java:52)
at hudson.console.LineTransformationOutputStream.eol(LineTransformationOutputStream.java:61)
at hudson.console.LineTransformationOutputStream.write(LineTransformationOutputStream.java:57)
at java.base/java.io.PrintStream.write(PrintStream.java:528)
at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:87)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$NewlineSafeTaskListener$1.write(DurableTaskStep.java:459)
at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:137)
at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$NewlineSafeTaskListener$1.write(DurableTaskStep.java:463)
at java.base/java.io.PrintStream.write(PrintStream.java:559)
at java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
at org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:74)
at hudson.remoting.ProxyOutputStream$Chunk.lambda$execute$0(ProxyOutputStream.java:260)
at hudson.remoting.PipeWriter$1.run(PipeWriter.java:159)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
at org.jenkinsci.remoting.CallableDecorator.call(CallableDecorator.java:18)
at hudson.remoting.CallableDecoratorList.lambda$applyDecorator$0(CallableDecoratorList.java:19)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:80)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Thanks
@ricardojdsilva87 Thanks for the information. Is this log line from when retryLogs
is disabled or enabled? Did changing the setting make any difference?
Hello @sarah-witt , the retryLogs
option is disabled but it seems to me that the log recorder on Jenkins gives thatJava stack trace information, the logs I sent on the first thread were directly from he container logs.
Answering your question the behaviour is the same, is there any other option that I can try out?
Thanks
@ricardojdsilva87 Can you confirm that you have a custom log source enabled for jenkins? And if so, what port is configured? https://github.com/jenkinsci/datadog-plugin#log-collection-for-agents
Hello @sarah-witt , sorry for the late reply. We are using the datadog agent installed as an helm chart, the only option to enable the logs is this one and we have it active like that. Also I've looked in the datadog documentation if any port was mentioned, couldn't find anything but I think that it's the same where the agent runs and connects (8126). Going trought this documentation, checking the agent status everything seems working fine (below also an example of the log collection).
==========
Logs Agent
==========
Reliable: Sending compressed logs in HTTPS to agent-http-intake.logs.datadoghq.eu on port 443
BytesSent: 3.994313914e+09
EncodedBytesSent: 2.0602406e+08
LogsProcessed: 2.960852e+06
LogsSent: 2.960833e+06
kube-system/efs-csi-node-t4ctd/csi-driver-registrar
---------------------------------------------------
- Type: file
Identifier: 0bd0884326c6fb370f736970243aca96f831136fac7a93763b2bc1a914c015fa
Path: /var/log/pods/kube-system_efs-csi-node-t4ctd_48c66625-14ab-4db0-ba5e-ad17da54efb7/csi-driver-registrar/*.log
Status: OK
1 files tailed out of 1 files matching
Inputs:
/var/log/pods/kube-system_efs-csi-node-t4ctd_48c66625-14ab-4db0-ba5e-ad17da54efb7/csi-driver-registrar/0.log
BytesRead: 0
Average Latency (ms): 0
24h Average Latency (ms): 0
Peak Latency (ms): 0
24h Peak Latency (ms): 0
Also trying the connection to the outbound site where the logs are sent to datadog seems to be successfull
By default the datadog agent collects all the logs from the running containers on the node. We needed to add only a custom configuration for the jenkins logs to parse the java stack traces so it won't appear as separate lines on the datadog portal.
podAnnotations:
ad.datadoghq.com/jenkins.logs:
'[{
"source": "jenkins",
"service": "jenkins",
"log_processing_rules": [{
"type": "multi_line",
"name": "log_start_with_date",
"pattern" : "\\d{4}-\\d{2}-\\d{2}\\s\\d{2}:\\d{2}:\\d{2}\\.\\d{3}\\+[0-9]{4}"
}]
}]'
Thanks
Hi @ricardojdsilva87, sorry for the delay in response. In your agent status output you should have a log source for jenkins, if not then you will have to explicitly create the file according to these docs so the agent can listen for the TCP traffic from the plugin: https://github.com/jenkinsci/datadog-plugin#log-collection-for-agents.
If the problem continues, do you mind creating a support ticket so that we can get more information to troubleshoot? Thanks!
Thanks for the reply @sarah-witt , I'm not seeing what might be wrong. By default the datadog agent installed using the helm chart only catches the container logs that you can see using kubernetes. In our case the jobs are run in separate pods and they are not outputing the job result to the console so the datadog agent cannot export it. We'll need to configure in the agent the different locations to fetch the log file from where they are kept.
I'll post here the result of the new changes Thanks for your help.
Hello we are having the following errors on Jenkins after enabling the log collection in Datadog plugin:
Tried to do a telnet on localhost 8126 and it works.
To Reproduce Steps to reproduce the behavior:
Expected behavior The agent should ship the logs to the central console
Configuration: The casc configuration is the following:
The plugin is working correctly sending the information about the jobs into datadog (we are using the already existent Datadog dashboard to check) example:
On the Jenkins pipeline we are setting the following:
Also testing the connections on the UI seem to work correctly:
We are not seeing any kind of pipeline logs information being shipped to datadog. Is there any kind of debugging we can do? Are some of the settings incorrect?
Also a question regarding the configuration: If we set the collectLogs variable:true in the pipeline do we need to specify the option to collect logs globally? In this case if we disable the collect logs option globally and set via the pipeline we still aren't receiving the logs, but the error logs on Jenkins mentioned above stops
Environment and Versions:
Thanks for your help