Open seanmcne opened 1 month ago
1.62.0
seems to be the latest working version for me 🤔
Later edit: spoke too soon, on the next run 1.62.0
also got the infinite loop 😞
1.62.0
seems to be the latest working version for me 🤔Later edit: spoke too soon, on the next run
1.62.0
also got the infinite loop 😞
I've noticed it isn't a 100% repro, sometimes it works, for now I've fallen back to the default (which I believe is 1.42.0) - on the upside I'm happy to know that it isn't just me experiencing the looped log/action output.
@patrickod - do you know if this looks to be a github action specific issue or something in core tailscale that might need to be addressed there?
@seanmcne it is possible that a recent change in the client might be at issue here but without more detailed logs is it hard to tell.
Would you be able to share with us the log output from a failed run that you have observed? Separately we recently released 1.66.4 which may address this issue. Is this bug reproducible with this most recent version?
I tried v1.66.4
this weekend and it failed for me. I am happy to re-rest or to help debug this so let me know if you need anything 🙂
Here is the workflow I used (note that I am also trying to use a Tailscale Exit Node):
- name: Setup Tailscale
uses: tailscale/github-action@v2
with:
version: 1.66.4
oauth-client-id: ${{ secrets.TAILSCALE_OAUTH_CLIENTID }}
oauth-secret: ${{ secrets.TAILSCALE_OAUTH_KEY }}
tags: tag:gha
hostname: xxxxxxxxx-initial-gha-${{ github.run_id }}
- name: Get Exit node IP
run: |
TAILSCALE_EXIT_NODE="$(dig +short my_fancy_node.xxxxxxxxx.ts.net)"
echo "TAILSCALE_EXIT_NODE=${TAILSCALE_EXIT_NODE}" >> "$GITHUB_ENV"
- name: Setup Tailscale with an exit node
uses: tailscale/github-action@v2
with:
version: 1.66.4
oauth-client-id: ${{ secrets.TAILSCALE_OAUTH_CLIENTID }}
oauth-secret: ${{ secrets.TAILSCALE_OAUTH_KEY }}
tags: tag:gha
hostname: xxxxxxxxx-gha-${{ github.run_id }}
args: "--exit-node=${{ env.TAILSCALE_EXIT_NODE }}"
which produced the following logs:
The error is coming from this action trying to download the tailscale package from pkgs.tailscale.com
: https://github.com/tailscale/github-action/blob/main/action.yml#L83. So, something is blocking the traffic to pkgs.tailscale.com
after the initial node was set up.
@Vlaaaaaaad I notice in your tailnet you have a DNS server running on one of the nodes, configured as a nameserver in https://login.tailscale.com/admin/dns. And your ACLs don't seem to allow traffic from tag:gha
to that node.
I'm not sure it's the culprit, but you might need to update the ACLs to allow access to that DNS server node.
Another possibility is if you use a self-hosted actions runner and something is mis-configured there.
I was not able to reproduce this with the same workflow file that you have. Of course, with github-managed runners it's always possible that they had a networking hiccup, or intermittently blocked pkgs.tailscale.com.
@awly yup, editing the ACL to allow tag:gha
to access the DNS server made it all work — I ran the workflow 3 times and it worked all 3 times! Thank you so much and apologies for the silly oversight on my end!
I am a bit surprised though: when the DNS server is not accessible I would've expected the fallback DNS servers (Google's DNS servers in my case) to be used 🤔
I am a bit surprised though: when the DNS server is not accessible I would've expected the fallback DNS servers (Google's DNS servers in my case) to be used 🤔
That's a good point, I'm not sure why it doesn't work that way. It's possible that because this is an ACL issue, packets to your private DNS server get silently dropped and the client just sits there waiting for an answer. As opposed to getting an NXDOMAIN or ICMP Destination Unreachable response and falling back to other servers.
@Vlaaaaaaad could you file a separate bug in http://github.com/tailscale/tailscale for this DNS fallback issue please?
@seanmcne could you try setting up a new node outside of github actions with tag:ci
and --accept-routes
, and see if curl -v https://pkgs.tailscale.com
works there? It may be a similar issue caused by some ACL rules.
@Vlaaaaaaad could you file a separate bug in http://github.com/tailscale/tailscale for this DNS fallback issue please?
@awly it seems there's already an issue for "DNS hangs when ACL blocks traffic to DNS servers and when Split DNS is used". Would you rather I comment on that issue something along the lines of "this also happens when Split DNS is not used" or would you rather I create a new "Fast-fail DNS when ACLs block traffic to the DNS server" issue?
@Vlaaaaaaad I think it's a separate issue: when you have multiple global nameservers (not split DNS), the query should time out faster and fall back to other configured nameservers.
@awly I think this is the worst bug report I ever submitted in my life, but I did it: https://github.com/tailscale/tailscale/issues/12403
I tried to find easier ways to reproduce this but I failed to find any easily testable options.
@awly it looks like I'm unable to repro, I have the log for tailscale captured now on every single github action run, just in case this happens again. It sounds like the path around downloading the package sounds like the most likely place to look at since it does explain how it's intermittent - I also was noticing some DNS issues around the same point-in-time (which seem to have cleared up without intervention) that cause seems to make sense to me.
Thanks!
I'm running the following in a github action, in previous versions it typically connects then when I call the up command again with an --exit-node value it will sometimes result in this constant repeating log of all 0's for many minutes until it times out. Am I using the action incorrectly given recent updates?