zarf-dev / zarf

DevSecOps for Air Gap & Limited-Connection Systems. https://zarf.dev/
Apache License 2.0
1.34k stars 163 forks source link

Very large docker containers fail with BLOB_UPLOAD_INVALID #2864

Open supcom234 opened 1 month ago

supcom234 commented 1 month ago

Environment

zarf_version: v0.32.4 rke2_version: "v1.28.9+rke2r1"

Steps to reproduce

  1. Use zarf to package up a large container
  2. The very large image must be huge (IE: 6GB or larger uncompressed https://hub.docker.com/layers/kasmweb/kali-rolling-desktop/1.15.0-rolling/images/sha256-d870969f00b54dd7e9dc7f4bc3382d979baecb8b1bff9646c4b1c64eab650700?context=explore)
  3. Package it up and deploy the zarf package.

Other notes

I was able to use docker commands and retag the container and push it to the registry as a test but zarf failed to push the same image.

Expected result

Zarf should upload very large containers without erroring out.

Actual Result

(1/3): PATCH[0m[0m\n[30;43m[30;43m [0m[0m [33m[33mhttps://registry.vp.bigbang.dev/v2/naps-dev/containers/chromium/blobs/uploads/9fca1fa3-8f7c-4f01-a0ba-1818f6fdfdca?_state=REDACTED:[0m[0m\n[30;43m[30;43m [0m[0m [33m[33mBLOB_UPLOAD_INVALID: blob upload invalid[0m[0m\n\n[30;43m[30;43m WARNING [0m[0m [33m[33mRetrying (2/3): PATCH[0m[0m\n[30;43m[30;43m [0m[0m [33m[33mhttps://registry.vp.bigbang.dev/v2/naps-dev/containers/kali-rolling-desktop/blobs/uploads/e9bc88fa-4d3c-466c-8075-d534cc0536e2?_state=REDACTED:[0m[0m\n[30;43m[30;43m [0m[0m [33m[33mBLOB_UPLOAD_INVALID: blob upload invalid[0m[0m\n\n[30;43m[30;43m WARNING [0m[0m [33m[33mRetrying (3/3): PATCH[0m[0m\n[30;43m[30;43m \ [0m[0m [33m[33mhttps://registry.vp.bigbang.dev/v2/naps-dev/containers/kali-rolling-desktop/blobs/uploads/9d30444b-ee16-4a95-801f-066c277502c7?_state=REDACTED:[0m[0m\n[30;43m[30;43m [0m[0m [33m[33mBLOB_UPLOAD_INVALID: blob upload invalid[0m[0m\n[101;30m[101;30m ERROR: [0m[0m [91m[91mFailed to deploy package: unable to deploy component \"kasm-registry-presetup\": unable to push images[0m[0m\n[101;30m[101;30m [0m[0m [91m[91mto the registry: PATCH[0m[0m\n[101;30m[101;30m [0m[0m [91m[91mhttps://registry.vp.bigbang.dev/v2/naps-dev/containers/kali-rolling-desktop/blobs/uploads/9d30444b-ee16-4a95-801f-066c277502c7?_state=REDACTED:[0m[0m\n[101;30m[101;30m [0m[0m [91m[91mBLOB_UPLOAD_INVALID: blob upload invalid

Visual Proof (screenshots, videos, text, etc)

image

Severity/Priority

Additional Context

Add any other context or screenshots about the technical debt here.

supcom234 commented 3 weeks ago

Some more information came to light that I think the zarf team will appreciate. We conducted more testing with our external docker registry and found when we zarf init and pointed directly to the --registry-url 127.0.0.1:31999 (IE: bypassing the nginx https proxy), the blob error disappeared with the zarf package deploy command. The blob error would appear when we zarf init with the registry.vp.bigbang.dev:443.

We also tested this with zarf version v0.36.1 as well as v0.32.4.

With that being said, I still think there is a bug within zarf because we were able to push the large container directly to the external docker registry through the nginx proxy without issue using docker push command. Our current workaround for now is we will bypass the https proxy for our test environment. This does not fix the issue for a muti-node Kubernetes production cluster.

Some additional context we are using an external docker registry on a one node kubernetes cluster for testing for production we will NOT be able to bypass the nginx proxy as the --registry-url forces us to use https with ip or domain. So this still needs to be fixed ASAP.

compose.yaml

version: '3'
services:
  registry:
    image: registry:2
    restart: always
    ports:
      - "31999:5000"
    environment:
      REGISTRY_AUTH: htpasswd
      REGISTRY_AUTH_HTPASSWD_REALM: Registry-Realm
      REGISTRY_AUTH_HTPASSWD_PATH: /auth/registry.passwd
      REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY: /data
    volumes:
      - ./data:/data
      - ./auth:/auth
    networks:
      - mynet

  nginx:
    image: nginx:latest
    restart: always
    ports:
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./certs:/etc/nginx/certs
    networks:
      - mynet

networks:
  mynet:
    driver: bridge

volumes:
  registrydata:
    driver: local

nginx.conf

events {}

http {
    client_max_body_size 20G;

    upstream registry {
        server registry:5000;
    }

    server {
        listen 443 ssl;
        server_name your.registry.ip;

        ssl_certificate /etc/nginx/certs/nginx.crt;
        ssl_certificate_key /etc/nginx/certs/nginx.key;

        location / {
            proxy_pass http://registry;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }
    }
}

auth/registry.passwd NOTE: the hash is just password for testing purposes

zarf-push:$2y$05$90aFnFpCbCkuuynoq1p70u/5v74W1pbM2doyYdvYnn1EZlQchddcm
zarf-pull:$2y$05$90aFnFpCbCkuuynoq1p70u/5v74W1pbM2doyYdvYnn1EZlQchddcm

certs/nginx.crt this is a 90day lets encrypt cert

certs/nginx.key this associated key with the cert

NOTE: You should be able to replicate the test with self signed certs

KevinBorden commented 3 weeks ago

From the pcap, it appears that the connection is being closed by zarf before waiting for an HTTP response. It looks like the sort of thing where a connection was force closed by the client after an error. Zarf_Error