Open iainsproat opened 9 months ago
Excerpt from the above:
"status": {
"path": "e271e90551-68622d75ea-b5dd1651b51a2a99cc0d/2c4acb8d-3234-3176-93b9-6df94deb472c/b5dd1651b51a2a99cc0d-71fa142437-3a00d4bbc2-0-log",
"size": 207093760
Does this status.size
value indicate a log size of 207Mb? Or am I misinterpreting this?
To answer my own comment, it seems to be bytes so this would be 207Mb: https://github.com/tektoncd/results/blob/32314594d4e5cf6de35e13cb7429ae216a969781/pkg/api/server/v1alpha2/logs.go#L173C21-L173C28
@iainsproat I believe the size of the log has something to do with this. Are you still having difficulties in storing large logs?
Hi @sayan-biswas - yes, this continues to be an issue for me.
Unfortunately I haven't had time to yet recreate a minimal replicating example so I'm as yet not sure if it's an issue with our cloud provider, my configuration of s3 or logs in tekton results, or something amiss with tekton results.
We experience similar errors when persisting TaskRun logs with version 0.13.0.
The related log records from the watcher pod are these:
{
"level": "warn",
"ts": 1732779634.9069145,
"logger": "fallback",
"caller": "dynamic/dynamic.go:574",
"msg": "tkn client std error output",
"name": "redacted-build-yion43-maven-build",
"errStr": "task maven-build has failed: \"step-mvn-goals\" exited with code 1\ncontainer step-mvn-goals has failed : [{\"key\":\"BUILD_STATUS\",\"value\":\"FAILURE\",\"type\":1},{\"key\":\"StartedAt\",\"value\":\"2024-11-28T08:32:16.206+01:00\",\"type\":3}]\n"
}
{
"level": "warn",
"ts": 1732779635.2895243,
"logger": "fallback",
"caller": "dynamic/dynamic.go:598",
"msg": "CloseAndRecv ret err",
"name": "redacted-build-yion43-maven-build",
"error": "rpc error: code = Unknown desc = got flush error operation error S3: UploadPart, https response error StatusCode: 404, RequestID: 330NVCR4JVGAB4J4, HostID: AgVF3dTseZBYPEbOYWdX/kJWkAzVXKgPFIjPwIyZHUuXoii/+uGsjcwbs9ar+FISt3PDe8neImU=, api error NoSuchUpload: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. with returnErr: operation error S3: UploadPart, failed to rewind transport stream for retry, request stream is not seekable"
}
{
"level": "error",
"ts": 1732779635.2895653,
"logger": "fallback",
"caller": "dynamic/dynamic.go:604",
"msg": "rpc error: code = Unknown desc = got flush error operation error S3: UploadPart, https response error StatusCode: 404, RequestID: 330NVCR4JVGAB4J4, HostID: AgVF3dTseZBYPEbOYWdX/kJWkAzVXKgPFIjPwIyZHUuXoii/+uGsjcwbs9ar+FISt3PDe8neImU=, api error NoSuchUpload: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. with returnErr: operation error S3: UploadPart, failed to rewind transport stream for retry, request stream is not seekable",
"stacktrace": "github.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).streamLogs\n\tgithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic/dynamic.go:604\ngithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).sendLog\n\tgithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic/dynamic.go:478\ngithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).Reconcile.func2.1\n\tgithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic/dynamic.go:208"
}
{
"level": "error",
"ts": 1732779635.2895837,
"logger": "fallback",
"caller": "dynamic/dynamic.go:480",
"msg": "Error streaming log",
"namespace": "xxxxx-tekton",
"kind": "TaskRun",
"name": "redacted-build-yion43-maven-build",
"error": "rpc error: code = Unknown desc = got flush error operation error S3: UploadPart, https response error StatusCode: 404, RequestID: 330NVCR4JVGAB4J4, HostID: AgVF3dTseZBYPEbOYWdX/kJWkAzVXKgPFIjPwIyZHUuXoii/+uGsjcwbs9ar+FISt3PDe8neImU=, api error NoSuchUpload: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed. with returnErr: operation error S3: UploadPart, failed to rewind transport stream for retry, request stream is not seekable",
"stacktrace": "github.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).sendLog\n\tgithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic/dynamic.go:480\ngithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic.(*Reconciler).Reconcile.func2.1\n\tgithub.com/tektoncd/results/pkg/watcher/reconciler/dynamic/dynamic.go:208"
}
And here the matching log in the api pod:
{
"level": "error",
"ts": 1732779635.2886696,
"caller": "v1alpha2/logs.go:250",
"msg": "operation error S3: UploadPart, https response error StatusCode: 404, RequestID: 330NVCR4JVGAB4J4, HostID: AgVF3dTseZBYPEbOYWdX/kJWkAzVXKgPFIjPwIyZHUuXoii/+uGsjcwbs9ar+FISt3PDe8neImU=, api error NoSuchUpload: The specified upload does not exist. The upload ID may be invalid, or the upload may have been aborted or completed.",
"stacktrace": "github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).handleReturn
github.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:250
github.com/tektoncd/results/pkg/api/server/v1alpha2.(*Server).UpdateLog
github.com/tektoncd/results/pkg/api/server/v1alpha2/logs.go:165
github.com/tektoncd/results/proto/v1alpha2/results_go_proto._Logs_UpdateLog_Handler
github.com/tektoncd/results/proto/v1alpha2/results_go_proto/api_grpc.pb.go:677
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/recovery.StreamServerInterceptor.func1
github.com/grpc-ecosystem/go-grpc-middleware/v2@v2.0.0-rc.5/interceptors/recovery/interceptors.go:48
main.main.WithStreamServerChain.ChainStreamServer.func28.1.1
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49
github.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).StreamServerInterceptor.func4
github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:121
main.main.WithStreamServerChain.ChainStreamServer.func28.1.1
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49
main.main.StreamServerInterceptor.func17
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/auth/auth.go:66
main.main.WithStreamServerChain.ChainStreamServer.func28.1.1
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49
github.com/grpc-ecosystem/go-grpc-middleware/logging/zap.StreamServerInterceptor.func1
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/logging/zap/server_interceptors.go:53
main.main.WithStreamServerChain.ChainStreamServer.func28.1.1
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49
github.com/grpc-ecosystem/go-grpc-middleware/tags.StreamServerInterceptor.func1
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/tags/interceptors.go:39
main.main.WithStreamServerChain.ChainStreamServer.func28.1.1
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:49
main.main.WithStreamServerChain.ChainStreamServer.func28
github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:58
google.golang.org/grpc.(*Server).processStreamingRPC
google.golang.org/grpc@v1.66.2/server.go:1695
google.golang.org/grpc.(*Server).handleStream
google.golang.org/grpc@v1.66.2/server.go:1809
google.golang.org/grpc.(*Server).serveStreams.func2.1
google.golang.org/grpc@v1.66.2/server.go:1029"
}
The according record in the database is this:
{
"kind": "Log",
"spec": {
"type": "S3",
"resource": {
"uid": "95a29d12-71ca-41ba-a087-d3ab2cfb1794",
"kind": "TaskRun",
"name": "redacted-build-yion43-maven-build",
"namespace": "itop-tekton"
}
},
"status": {
"path": "itop-tekton/444dc842-09c0-3f12-9386-ad0311d8878d/redacted-build-yion43-maven-build-log",
"size": 207093760,
"isStored": false,
"isRetryableErr": true,
"errorOnStoreMsg": "operation error S3: UploadPart, failed to rewind transport stream for retry, request stream is not seekable"
},
"metadata": {
"uid": "444dc842-09c0-3f12-9386-ad0311d8878d",
"name": "redacted-build-yion43-maven-build-log",
"namespace": "itop-tekton",
"creationTimestamp": null
},
"apiVersion": "results.tekton.dev/v1alpha3"
}
An interesting coincidence with the original report:
"size": 207093760
So upload seems to fail at the exact same amount of bytes.
With this pipeline run I'm able to reproduce the error in our OpenShift environment with tekton-results v0.13.0 configured to save logs to S3:
apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
generateName: test-pipeline-1-run-
spec:
pipelineSpec:
workspaces:
- name: workspace
tasks:
- name: run-1
taskSpec:
steps:
- name: step-1
image: registry.access.redhat.com/ubi8/ubi-minimal:8.10
script: |
#!/usr/bin/env bash
echo "Start generating 210MB of logs ..."
COUNT=0
while (( COUNT < 220200 )); do
echo "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu fooooo"
((COUNT++))
done
echo "Done."
env:
- name: HOME
value: /tekton/home
taskRunTemplate:
serviceAccountName: pipelines
timeouts:
pipeline: 30m0s
workspaces:
- name: workspace
emptyDir: {}
I hope this will help you to reproduce and further investigate this issue.
There is a step in a pipeline run which fails with an error (the application exits with a non-zero exit code). Unfortunately the log content is not stored.
All other pipelines and steps have their logs stored as expected, so this does not seem to be a general configuration problem.
Expected Behavior
All steps in the pipeline would have their logs stored, even (and especially) if the user application exited with a non-zero exit code. A copy of the pipeline can be found below.
This is consistently failing for all numerous repeats.
Actual Behavior
The Tekton results API logs the following two errors:
I can see that a corresponding log record has been stored:
For convenience, the decoded base64 content of the above value is:
When inspecting the content of the s3 blob storage directly I can see that no bucket has been created at the given path.
Likewise the logs cannot be retrieved via the API, as would be expected given the above symptoms.
Steps to Reproduce the Problem
Unfortunately I am not able to yet recreate a minimally failing example of the user workload generating the logs. The pipeline is as follows:
Additional Info
v1.28.2
v0.56.1
v0.9.1