tailscale / github-action

A GitHub Action to connect your workflow to your Tailscale network.
BSD 3-Clause "New" or "Revised" License
519 stars 78 forks source link

Action does not download package - fails with "gzip: stdin: not in gzip format" error #89

Closed GalOzRlz closed 9 months ago

GalOzRlz commented 10 months ago

For more than 24 hours our action is failing from time to time with this error - it looks like it won't download the package. Is this a CDN issue on the servers?

Run tailscale/github-action@v1
  with:
    authkey: ***
    version: 1.30.0
  env:
    CODEARTIFACT_URL:****
    pythonLocation: /opt/hostedtoolcache/Python/3.8.17/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.17/x64/lib
    AWS_DEFAULT_REGION: us-east-[2]
    AWS_REGION: us-east-2
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
    AWS_SESSION_TOKEN: ***
    POSTGRES_URL: ***
Run MINOR=$(echo $VERSION | awk -F '.' {'print $2'})
  MINOR=$(echo $VERSION | awk -F '.' {'print $2'})
  if [ $((MINOR % 2)) -eq 0 ]; then
    URL="https://pkgs.tailscale.com/stable/tailscale_${VERSION}_amd64.tgz"
  else
    URL="https://pkgs.tailscale.com/unstable/tailscale_${VERSION}_amd64.tgz"
  fi
  curl -H user-agent:tailscale-github-action-v1 -L "$URL" -o tailscale.tgz
  tar -C ${HOME} -xzf tailscale.tgz
  rm tailscale.tgz
  TSPATH=${HOME}/tailscale_${VERSION}_amd64
  sudo mv "${TSPATH}/tailscale" "${TSPATH}/tailscaled" /usr/bin
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
  env:
    CODEARTIFACT_URL: ****
    pythonLocation: /opt/hostedtoolcache/Python/[3](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:3).8.17/x6[4](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:4)
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.17/x64/lib
    AWS_DEFAULT_REGION: us-east-2
    AWS_REGION: us-east-2
    AWS_ACCESS_KEY_ID: ***
    AWS_SECRET_ACCESS_KEY: ***
    AWS_SESSION_TOKEN: ***
    POSTGRES_URL: ****(https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:5)j.us-east-2.rds.amazonaws.com/postgres
    VERSION: 1.30.0
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100    81  100    81    0     0    [6](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:6)99      0 --:--:-- --:--:-- --:--:--   [7](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:7)04

100    1[9](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:9)  [10](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:10)0    [19](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:20)    0     0     [37](https://github.com/SeemplicityDev/engine/actions/runs/6010592451/job/16302345813#step:17:38)      0 --:--:-- --:--:-- --:--:--    37
100    19  100    19    0     0     37      0 --:--:-- --:--:-- --:--:--     0

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Error: Process completed with exit code 2.
zonorti commented 10 months ago

I want to fix it with using github cache #87, so after one run it should work. But I am going to touch only v2

GalOzRlz commented 10 months ago

no problem - we will update to version 2 if needed.

zonorti commented 10 months ago

@GalOzRlz please have a look at my PR. it works for me, but I am new to GH Actions, so YMMV

creachadair commented 10 months ago

I suspect this may have been related to #88, as the error responses would not have been gzipped.

GalOzRlz commented 9 months ago

This is happening again today!

beaugunderson commented 9 months ago

yes, I'm also seeing this hundreds of times today as we use tailscale in our deployment workflow... we upgraded from v1 to v2 and still have the same issue:

69540
DentonGentry commented 9 months ago

@beaugunderson @GalOzRlz What tailscale version is it trying to download? The v2 action defaults to 1.42.0, is that it or do you set version ?

GalOzRlz commented 9 months ago

@beaugunderson @GalOzRlz What tailscale version is it trying to download? The v2 action defaults to 1.42.0, is that it or do you set version ?

We fix the version to version: 1.30.0 --

  - name: Tailscale
    uses: tailscale/github-action@v1
    with:
      authkey: ${{ secrets.TAILSCALE_AUTHKEY }}
      version: 1.30.0
beaugunderson commented 9 months ago

1.24.2 for us (for no special reason, I assume we could upgrade to something more recent without major issues if it would help)

On Tue, Oct 03, 2023 at 11:06 PM, Denton Gentry @.***> wrote:

@beaugunderson https://github.com/beaugunderson What tailscale version is it trying to download? The v2 action defaults to 1.42.0, is that it or do you set version ?

— Reply to this email directly, view it on GitHub https://github.com/tailscale/github-action/issues/89#issuecomment-1746194559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCX5GLQ36R3UF5P32V7DX5T4IDAVCNFSM6AAAAAA4CZYL5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWGE4TINJVHE . You are receiving this because you were mentioned.Message ID: @.***>

DentonGentry commented 9 months ago

I assume we could upgrade to something more recent without major issues if it would help

At this point I don't think a version change would help. We're focussing on the pkgs server and the CDN in front of it, which is used for all Tailscale versions.

avi-city commented 9 months ago

Thanks @DentonGentry , much appreciated

DentonGentry commented 9 months ago

It isn't failing everywhere:

$ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.24.2_amd64.tgz --output /tmp/tailscale_1.24.2_amd64.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    81  100    81    0     0    194      0 --:--:-- --:--:-- --:--:--   195
100 18.2M  100 18.2M    0     0  16.6M      0  0:00:01  0:00:01 --:--:-- 41.9M
$ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.24.2_amd64.tgz.sha256
a25bba595af9a67fa2b3ef7df5f92b343179470b14115c76fd2f6e92cc3bc9ac
$ echo "a25bba595af9a67fa2b3ef7df5f92b343179470b14115c76fd2f6e92cc3bc9ac /tmp/tailscale_1.24.2_amd64.tgz" | sha256sum -c
/tmp/tailscale_1.24.2_amd64.tgz: OK
$ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.30.0_amd64.tgz --output /tmp/tailscale_1.30.0_amd64.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    81  100    81    0     0    188      0 --:--:-- --:--:-- --:--:--   188
100 19.5M  100 19.5M    0     0  17.0M      0  0:00:01  0:00:01 --:--:-- 17.0M
$ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.30.0_amd64.tgz.sha256
8660387a79d539838f032e0cbe620cb241ee3f0636fd8942275d37ca38f456fd
$ echo "8660387a79d539838f032e0cbe620cb241ee3f0636fd8942275d37ca38f456fd /tmp/tailscale_1.30.0_amd64.tgz" | sha256sum -c
/tmp/tailscale_1.30.0_amd64.tgz: OK

A workflow using this GitHub Action passed here a few minutes ago: https://github.com/tailscale/github-action/actions/runs/6403096635/job/17380921729?pr=91


I suppose a problem in the CDN serving pkgs.tailscale.com might only manifest in GitHub datacenters other than the one we used for that run. However it looks like all GitHub Actions still run in the US, and the CDN doesn't have a huge number of different regional nodes.

For anyone experiencing trouble today: are you self-hosting runners?

beaugunderson commented 9 months ago

at the moment we’re seeing majority pass, maybe 20-30% fail

On Wed, Oct 4 2023 at 01:05, Denton Gentry @.***> wrote:

It isn't failing everywhere:

$ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.24.2_amd64.tgz --output /tmp/tailscale_1.24.2_amd64.tgz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 81 100 81 0 0 194 0 --:--:-- --:--:-- --:--:-- 195 100 18.2M 100 18.2M 0 0 16.6M 0 0:00:01 0:00:01 --:--:-- 41.9M $ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.24.2_amd64.tgz.sha256 a25bba595af9a67fa2b3ef7df5f92b343179470b14115c76fd2f6e92cc3bc9ac https://pkgs.tailscale.com/stable/tailscale_1.24.2_amd64.tgz.sha256a25bba595af9a67fa2b3ef7df5f92b343179470b14115c76fd2f6e92cc3bc9ac $ echo "a25bba595af9a67fa2b3ef7df5f92b343179470b14115c76fd2f6e92cc3bc9ac /tmp/tailscale_1.24.2_amd64.tgz" | sha256sum -c /tmp/tailscale_1.24.2_amd64.tgz: OK

$ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.30.0_amd64.tgz --output /tmp/tailscale_1.30.0_amd64.tgz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 81 100 81 0 0 188 0 --:--:-- --:--:-- --:--:-- 188 100 19.5M 100 19.5M 0 0 17.0M 0 0:00:01 0:00:01 --:--:-- 17.0M $ curl -H user-agent:tailscale-github-action -L https://pkgs.tailscale.com/stable/tailscale_1.30.0_amd64.tgz.sha256 8660387a79d539838f032e0cbe620cb241ee3f0636fd8942275d37ca38f456fd https://pkgs.tailscale.com/stable/tailscale_1.30.0_amd64.tgz.sha2568660387a79d539838f032e0cbe620cb241ee3f0636fd8942275d37ca38f456fd $ echo "8660387a79d539838f032e0cbe620cb241ee3f0636fd8942275d37ca38f456fd /tmp/tailscale_1.30.0_amd64.tgz" | sha256sum -c /tmp/tailscale_1.30.0_amd64.tgz: OK

A workflow using this GitHub Action passed here a few minutes ago: https://github.com/tailscale/github-action/actions/runs/6403096635/job/17380921729?pr=91

I suppose a problem in the CDN serving pkgs.tailscale.com might only manifest in GitHub datacenters other than the one we used for that run. However it looks like all GitHub Actions still run in the US https://github.com/orgs/community/discussions/11727, and the CDN doesn't have a huge number of different regional nodes.

For anyone experiencing trouble today: are you self-hosting runners?

— Reply to this email directly, view it on GitHub https://github.com/tailscale/github-action/issues/89#issuecomment-1746351605, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAPCXZHYJDWB4CU4FBQAM3X5UKGNAVCNFSM6AAAAAA4CZYL5OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONBWGM2TCNRQGU . You are receiving this because you were mentioned.Message ID: @.***>

GalOzRlz commented 9 months ago

No self running hosts in these workflows

beaugunderson commented 9 months ago

any news here? we're still seeing lots of failures

beaugunderson commented 9 months ago

fwiw 1.48.2 is installing for us with no failures yet!

DentonGentry commented 9 months ago

@beaugunderson 1.24.2 gets a checksum error and 1.48.2 does not?

beaugunderson commented 9 months ago

@DentonGentry zero failures out of ~60 after switching to 1.48.2 👍

DentonGentry commented 9 months ago

If someone who can reproduce the issue is willing to run the action @a5ed86cd4900e943e2aa3f760fd8472f04e46469 instead of v2, it will output the expected and actual checksums to get an idea what is happening.

GalOzRlz commented 9 months ago

@a5ed86cd4900e943e2aa3f760fd8472f04e46469

I'm going to try and make this happen and get back to you

GalOzRlz commented 9 months ago

@DentonGentry

We got one! image

still using version: 1.30.0 btw

DentonGentry commented 9 months ago

It got the redirect to the CDN, then downloaded 19 bytes and stopped. b16e15764b8bc06c5c3f9f19bc8b99fa48e7894aa5a6ccdad65da49bbf564793 is the SHA256 of "404 page not found"

DentonGentry commented 9 months ago

@GalOzRlz

GalOzRlz commented 9 months ago

@GalOzRlz

  • Could I ask the exact content of the "version" string passed as an input to the GitHub action? Is it "1.30.0" and nothing more?

Yep just 1.30.0

  • Could I ask what continent you are located on? I guess your jobs might run in a different GitHub datacenter, which in turn might be directed to a different CDN node.

We're in Israel.

XciD commented 9 months ago

Getting the same.

Version: 1.36.0 us-east-1 in AWS

DentonGentry commented 9 months ago

I think we've worked out the series of issues starting from a data consistency problem in a provider's tools, which led to not deploying updates to two of the nodes in our infrastructure, which led to a 404 for those unfortunate enough to hit one of those nodes.

We think this is resolved now. If you see checksum errors occurring after 5:30pm Pacific time on October 5, please update this issue.

DentonGentry commented 9 months ago

We also added metrics for this specific issue, and have seen no occurrences since October 5. I'm going to close this as resolved.

For people reading this in the future, if you see "gzip: stdin: not in gzip format" or a checksum error please do not comment on this issue or reopen it. This one is fixed. If you encounter a new problem, even with the same symptom, please open a new bug.