tailscale / github-action

A GitHub Action to connect your workflow to your Tailscale network.
BSD 3-Clause "New" or "Revised" License
566 stars 86 forks source link

Tailscale step runs successfully but subsequent steps to connect to DB fail #130

Open khernandezrt opened 4 months ago

khernandezrt commented 4 months ago

We created the correct tags and set the scope to device. The step for Tailscale runs(i dont see any confirmations that we are connected) but the step to run my tests fail with ERROR tests/mycode/code/test_my_code.py - sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2003, "Can't connect to MySQL server on 'mysqlserver.us-east-1.rds.amazonaws.com' (timed out)")

We also see the node being created on the Tailscale UI but i keep getting a timeout when I run pytest.

name: Python application

on:
  push:
    branches: [ "feature/github-actions" ]
  pull_request:
    branches: [ "feature/github-actions" ]

env:
  AWS_CONFIG_FILE: .github/workflows/aws_config
  DB_NAME: "mydbname"
  DB_READ_SERVER: "mysqlserver.us-east-1.rds.amazonaws.com"
  DB_USERNAME: "root"
  DB_PASSWORD: ${{secrets.DB_PASSWORD}}

  AWS_PROFILE: "dev"
  API_VERSION: "v1"
  FRONT_END_KEY: ${{secrets.FRONT_END_KEY}}

  LOG_LEVEL: "INFO"
  DB_USER_ID: 32
  SENTRY_SAMPLE_RATE: 1
  NUMEXPR_MAX_THREADS: "8"

  LOG_LEVEL_CONSOLE: True
  LOG_LEVEL_ALGORITHM: "INFO"
  LOG_LEVEL_DB: "WARNING"

permissions:
  contents: read

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - name: Tailscale
        uses: tailscale/github-action@v2
        with:
          oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }}
          oauth-secret: ${{ secrets.TS_OAUTH_SECRET }}
          tags: tag:cicd
      - uses: actions/checkout@v4
      - name: Set up Python 3.12
        uses: actions/setup-python@v3
        with:
          python-version: "3.12"
      - name: Install dependencies
        run: |
          pip install -r requirements-dev.txt
      - name: Test with pytest
        env: 
          PYTHONPATH: ${{github.workspace}}/src
        run: |
          pytest
khernandezrt commented 4 months ago

Switching the URL to a direct IP did the trick. Looks like a DNS issue. I will leave this issue open as id prefer not to use a direct IP.

henworth commented 3 months ago

I'm encountering a similar timeout error, although doesn't seem to be DNS in my case as the IP is resolved properly:

Error: Error connecting to PostgreSQL server database.us-east-1.rds.amazonaws.com (scheme: awspostgres): dial tcp correct.ip.address:5432: connect: connection timed out
khernandezrt commented 3 months ago

@henworth Have you setup your security policies correctly for your Tailscale instance?

henworth commented 3 months ago

@henworth Have you setup your security policies correctly for your Tailscale instance?

Yep, I've done all this. It was working fine and now I'm not sure what's wrong.

Connectivity to this db works fine from other non-GitHub nodes using hostname or ip.

talha5389-teraception commented 3 months ago

I also started having issues 2 weeks ago. I have also verified that things works fine outside of github actions using same configuration

ebarriosjr commented 3 months ago

I am having the same issue. It has been working perfectly so far but today I get random i/o timeouts.

ericpollmann commented 3 months ago

Same here! I had random failures especially on the first connection to our RDS instance (running in AWS) from a github action worker (running in Azure). Subsequent connections after the first failure would succeed. I did some debugging and found that the connection is going through DERP despite having inbound wireguard port for IPv4/v6 on the AWS side.

I changed our use to first run a single ping to the subnet router DNS hostname after bringing up tailscale and that seemed to dramatically improve reliability though still had 1 fail in 10 (that time it was the ping itself failing)

Set up Split DNS and haven't had a failure since then, though only have had 10 or so runs since then.

henworth commented 3 months ago

My issue turned out to be related to the stateful filtering added in v1.66.0. Once I disabled that on my subnet routers the problem disappeared.