mendersoftware / mender-server

Other
2 stars 10 forks source link

Issues with Artifact Upload in Mender 3.7.4 #120

Open mehta-akshay-scanomat opened 1 day ago

mehta-akshay-scanomat commented 1 day ago

Description:

I am currently using the open-source version of Mender (3.7.4) and encountering persistent issues when attempting to upload artifacts. Specifically, I receive a 5xx error during the upload process.

When using the UI to upload an artifact, I see the message: "Artifact couldn't be generated. Request failed with status code 502." The logs for the mender-deployment service show a corresponding status code of 500:

time="2024-10-21T16:55:48Z" level=error msg="azblob PutObject: failed to upload object to blob: context canceled" caller="view.(*RESTView).RenderInternalError@view.go:72" request_id=66772fb8-f862-451a-83b7-046743424cc2 user_id=753afdfb-ee20-4fd3-985e-85c74fe4c56e
time="2024-10-21T16:55:48Z" level=info msg="500 59998118μs POST /api/management/v1/deployments/artifacts/generate HTTP/1.1 - Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Safari/537.36" byteswritten=78 caller="accesslog.(*AccessLogMiddleware).MiddlewareFunc.func1@middleware.go:82" method=POST path=/api/management/v1/deployments/artifacts/generate qs= request_id=66772fb8-f862-451a-83b7-046743424cc2 responsetime=59.998118647 status=500 ts="2024-10-21 16:54:48.518920248 +0000 UTC" type=http user_id=753afdfb-ee20-4fd3-985e-85c74fe4c56e

When using curl, I receive a 502 Bad Gateway error:

curl -X POST ${URL}/api/management/v1/deployments/artifacts \
       -H 'Content-Type: multipart/form-data' \
       -H "Authorization: Bearer ${JWT}" \
       -F "artifact=@${ARTIFACT}"

Using the Mender CLI for artifact uploads results in a status code 409.

I have identified a pattern: files that take longer than one minute to upload consistently fail, whereas smaller files (around 200-300 MB) that upload in under a minute succeed with a status code of 201.

Questions:

  1. Is there a default size limitation or timeout configuration that could be affecting these uploads?
  2. I found some relevant configurations in the Mender Server repository. I attempted to modify these settings by editing the deployment in Kubernetes:
DEPLOYMENTS_STORAGE_DEFAULT:                 azure
DEPLOYMENTS_STORAGE_UPLOAD_EXPIRE_SECONDS:   300
DEPLOYMENTS_STORAGE_MAX_GENERATE_DATA_SIZE:  1073741824

However, the issue persists. Any guidance on resolving this would be greatly appreciated!

Thank you!

oldgiova commented 1 day ago

Hello @mehta-akshay-scanomat , are you using the official Helm Chart? Did you configure Azure Blob Storage as documented? Which Ingress controller are you using?

mehta-akshay-scanomat commented 1 day ago

Hello @mehta-akshay-scanomat , are you using the official Helm Chart? Did you configure Azure Blob Storage as documented? Which Ingress controller are you using?

  1. Yes, I'm using the official helm chart of mender.
  2. I have configured Azure Blob Storage as given in the mender documentation.
  3. I'm using traefik as ingress controller which I think is default ingress controller.
oldgiova commented 1 day ago

It could be an Ingress controller timeout; with this troubleshooting tip, the solution was to increase the proxy body size. Maybe something similar also with Traefik. You should see some error logs in the Traefik Ingress controller deployment.

mehta-akshay-scanomat commented 9 hours ago

@oldgiova Thanks for reply. I have added the the below argument as given in the traefik documentation to 300s which default was 60s --entryPoints.name.transport.respondingTimeouts.readTimeout=300 But still I'm not able to upload artifact. The error in logs of mender-deployment is still same but in UI status code is changed to 499. It says Artifact couldn't be generated. Request failed with status code 499

I tried to upload using mender-cli as well and got the 499 error as shown below:

67.22 MiB / 535.74 MiB [----------------------->__________________________________________________________________________________________________________________________________________________________________] 12.55% 1.11 MiB p/sVERBOSE response: HTTP/1.1 499 status code 499Connection: closeContent-Length: 21Date: Wed, 23 Oct 2024 12:14:13 GMTReferrer-Policy: no-referrerStrict-Transport-Security: max-age=31536000; includeSubDomains; preloadVary: Accept-EncodingX-Content-Type-Options: nosniffX-Xss-Protection: 1; mode=blockClient Closed RequestFAILURE: artifact upload to 'mender.scanomat.com' failed with status 499ERROR: exit status: 1

When I used curl to upload artifact, I got the following response:

curl -v -X POST \  https://mender.scanomat.com/api/management/v1/deployments/artifacts \  
-H "Authorization: Bearer ..." \  
-F "artifact=@boss-imx8mm-var-dart-0.0.0-dev.mender"
Note: Unnecessary use of -X or --request, POST is already inferred.* Host mender.scanomat.com:443 was resolved.* 
IPv6: (none)* IPv4: 13.69.133.251*   Trying 13.69.133.251:443...* Connected to mender.scanomat.com (13.69.133.251) 
port 443* ALPN: curl offers h2,http/1.1* (304) (OUT), TLS handshake, Client hello (1):*  CAfile: /etc/ssl/cert.pem*  
CApath: none* (304) (IN), TLS handshake, Server hello (2):* (304) (IN), TLS handshake, Unknown (8):* (304) (IN), TLS 
handshake, Certificate (11):* (304) (IN), TLS handshake, CERT verify (15):* (304) (IN), TLS handshake, Finished (20):* (304) 
(OUT), TLS handshake, Finished (20):* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / 
UNDEF* ALPN: server accepted h2* Server certificate:*  subject: CN=mender.scanomat.com*  
start date: May 17 06:19:53 2024 GMT*  expire date: Jun 18 06:19:53 2025 GMT*  subjectAltName: 
host "mender.scanomat.com" matched cert's "mender.scanomat.com"*  
issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; 
CN=Go Daddy Secure Certificate Authority - G2*  SSL certificate verify ok.* using HTTP/2* [HTTP/2] [1] OPENED stream for https://mender.scanomat.com/api/management/v1/deployments/artifacts* 
[HTTP/2] [1] [:method: POST]* [HTTP/2] [1] [:scheme: https]* [HTTP/2] [1] [:authority: mender.scanomat.com]* 
[HTTP/2] [1] [:path: /api/management/v1/deployments/artifacts]* 
[HTTP/2] [1] [user-agent: curl/8.7.1]* [HTTP/2] [1] 
[accept: */*]* [HTTP/2] [1] [authorization: Bearer ...]* [HTTP/2] [1] [content-length: 561762549]* [HTTP/2] [1]
 [content-type: multipart/form-data; boundary=------------------------qKlT1gdNLonJbvM5ahJJxT]>
 POST /api/management/v1/deployments/artifacts HTTP/2> Host: mender.scanomat.com> User-Agent: curl/8.7.1> Accept: */*> 
Authorization: Bearer ...> Content-Length: 561762549> Content-Type: multipart/form-data; boundary=------------------------qKlT1gdNLonJbvM5ahJJxT> 
< HTTP/2 499 < date: Wed, 23 Oct 2024 12:54:45 GMT< referrer-policy: no-referrer< strict-transport-security: max-age=31536000; includeSubDomains; preload< vary: 
Accept-Encoding< x-content-type-options: nosniff< x-xss-protection: 1; mode=block< content-length: 21< * 
HTTP error before end of send, stop sending* abort upload after having sent 479788000 bytes* Connection #0 to host mender.scanomat.com left intactClient Closed Request%  

These are how my Traefik deployment and mender-deployment looks like

Name:                   traefik
Namespace:              default
CreationTimestamp:      Wed, 23 Oct 2024 10:26:44 +0000
Labels:                 app.kubernetes.io/instance=traefik-default
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=traefik
                        helm.sh/chart=traefik-32.1.1
Annotations:            deployment.kubernetes.io/revision: 3
                        meta.helm.sh/release-name: traefik
                        meta.helm.sh/release-namespace: default
Selector:               app.kubernetes.io/instance=traefik-default,app.kubernetes.io/name=traefik
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  0 max unavailable, 1 max surge
Pod Template:
  Labels:           app.kubernetes.io/instance=traefik-default
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=traefik
                    helm.sh/chart=traefik-32.1.1
  Annotations:      prometheus.io/path: /metrics
                    prometheus.io/port: 9100
                    prometheus.io/scrape: true
  Service Account:  traefik
  Containers:
   traefik:
    Image:       docker.io/traefik:v3.1.6
    Ports:       9100/TCP, 9000/TCP, 8000/TCP, 8443/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP, 0/TCP
    Args:
      --global.checknewversion
      --global.sendanonymoususage
      --entryPoints.metrics.address=:9100/tcp
      --entryPoints.traefik.address=:9000/tcp
      --entryPoints.web.address=:8000/tcp
      --entryPoints.websecure.address=:8443/tcp
      --api.dashboard=true
      --ping=true
      --metrics.prometheus=true
      --metrics.prometheus.entrypoint=metrics
      --providers.kubernetescrd
      --providers.kubernetescrd.allowEmptyServices=true
      --providers.kubernetesingress
      --providers.kubernetesingress.allowEmptyServices=true
      --entryPoints.websecure.http.tls=true
      --entryPoints.websecure.transport.respondingTimeouts.readTimeout=300
      --log.level=INFO
    Liveness:   http-get http://:9000/ping delay=2s timeout=2s period=10s #success=1 #failure=3
    Readiness:  http-get http://:9000/ping delay=2s timeout=2s period=10s #success=1 #failure=1
    Environment:
      POD_NAME:        (v1:metadata.name)
      POD_NAMESPACE:   (v1:metadata.namespace)
    Mounts:
      /data from data (rw)
      /tmp from tmp (rw)
  Volumes:
   data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
   tmp:
    Type:          EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:        
    SizeLimit:     <unset>
  Node-Selectors:  <none>
  Tolerations:     <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  traefik-6c56784b69 (0/0 replicas created), traefik-7858bc95cb (0/0 replicas created)
NewReplicaSet:   traefik-7d9bf4d54 (1/1 replicas created)
Events:          <none>
Name:                   mender-deployments
Namespace:              default
CreationTimestamp:      Wed, 23 Oct 2024 08:29:05 +0000
Labels:                 app.kubernetes.io/component=deployments
                        app.kubernetes.io/instance=mender
                        app.kubernetes.io/managed-by=Helm
                        app.kubernetes.io/name=mender-deployments
                        app.kubernetes.io/part-of=mender
                        app.kubernetes.io/version=3.7.7
Annotations:            deployment.kubernetes.io/revision: 2
                        meta.helm.sh/release-name: mender
                        meta.helm.sh/release-namespace: default
Selector:               app.kubernetes.io/name=mender-deployments
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  0 max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/component=deployments
                    app.kubernetes.io/instance=mender
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=mender-deployments
                    app.kubernetes.io/part-of=mender
                    app.kubernetes.io/version=3.7.7
  Service Account:  default
  Containers:
   deployments:
    Image:      docker.io/mendersoftware/deployments:mender-3.7
    Port:       <none>
    Host Port:  <none>
    Args:
      server
      --automigrate
    Limits:
      cpu:     300m
      memory:  128Mi
    Requests:
      cpu:      300m
      memory:   64Mi
    Liveness:   http-get http://:8080/api/internal/v1/deployments/alive delay=0s timeout=1s period=5s #success=1 #failure=3
    Readiness:  http-get http://:8080/api/internal/v1/deployments/health delay=0s timeout=1s period=15s #success=1 #failure=3
    Startup:    http-get http://:8080/api/internal/v1/deployments/alive delay=0s timeout=1s period=5s #success=1 #failure=36
    Environment Variables from:
      mongodb-common     Secret with prefix 'DEPLOYMENTS_'  Optional: false
      artifacts-storage  Secret with prefix 'DEPLOYMENTS_'  Optional: false
    Environment:
      DEPLOYMENTS_STORAGE_DEFAULT:               azure
      DEPLOYMENTS_MIDDLEWARE:                    prod
      DEPLOYMENTS_AWS_TAG_ARTIFACT:              
      DEPLOYMENTS_STORAGE_ENABLE_DIRECT_UPLOAD:  true
      DEPLOYMENTS_STORAGE_MAX_IMAGE_SIZE:        1073741824
    Mounts:                                      <none>
  Volumes:                                       <none>
  Node-Selectors:                                <none>
  Tolerations:                                   <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  mender-deployments-965b7b49f (0/0 replicas created)
NewReplicaSet:   mender-deployments-8d9dbd4db (1/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  28m   deployment-controller  Scaled up replica set mender-deployments-8d9dbd4db to 1
  Normal  ScalingReplicaSet  28m   deployment-controller  Scaled down replica set mender-deployments-965b7b49f to 0 from 1