linear-b / gitstream

/:\ gitStream - Workflow automation for your git repo. Use YAML to auto-assign reviewers, auto-merge PRs, automatic PR labeler, and more.
https://gitstream.cm
Apache License 2.0
271 stars 33 forks source link

Main gitStream check hanging and being skipped #608

Open tom-moore opened 5 days ago

tom-moore commented 5 days ago

Describe the bug

We have gitStream configured org wide in a main cm repo. We are finding that occasionally the main gitstream.cm action fails however on the PR itself the check runs until it is skipped with a 'Check could not be completed' after 10-11 minutes (not sure why, the timeout-minutes on the main action is set to 5 though the actual action is erroring out prior anyway). Even though we have gitStream.cm as a required check, the skip means that the PR is mergable even though the actions haven't re-run on the latest update.

When looking at the action runs on the main cm repo, the failures all are failures in less than a minute with the error The process '/usr/bin/git' failed with exit code 1. The check on the PR is not failing however. We are using v2 of the GH action.

For example:

  1. Gitstream runs and a required approval check fails
  2. The PR is updated via a commit
  3. Gitstream re-runs with the action failing in the cm repo but the check on the PR is hung and is eventually skipped.

To Reproduce

This is a slightly redacted version of our gitstream file (with org name prefix removed from the team names and replaced with ).

# -*- mode: yaml -*-
manifest:
  version: 1.0
automations:
  # Add a label that indicates how many minutes it will take to review the PR.
  estimated_time_to_review: 
    if:
      - true
    run:
      - action: add-label@v1
      # etr is defined in the last section of this example
        args:
          label: "{{ calc.etr }} min review"
          color: {{ 'E94637' if (calc.etr >= 20) else ('FBBD10' if (calc.etr >= 5) else '36A853') }}
  # Triggered for PRs that don't have either a Jira ticket number in the title,
  # or a link to a Jira ticket in the PR description.
  label_missing_jira_info:
    if:
      - {{ not (has.jira_ticket_in_title or has.jira_ticket_in_desc) }}
    run:
      - action: add-label@v1
        args:
          label: "missing-jira"
          color: 'F6443B'
  # Triggered for any changes that only affect formatting, documentation or tests
  safe_changes:
    if:
      - {{ is.formatting or is.docs or is.tests }}
    run:
      - action: add-label@v1
        args:
          label: "safe-changes"
  # Post a comment that lists the best experts for the files that were modified.
  explain_code_experts:
    if:
      - true
    run:
      - action: explain-code-experts@v1 
        args:
          gt: 10
  # Assign data team to review database modifications. 
  review_data_schema_changes:
    if:
      - {{ files | match(regex=r/src\/database\/.*/) | some }}
      - {{ repo.name | match(list=['express']) }}
    run:
      - action: add-reviewers@v1
        args:
          reviewers: [<org-name>/data-team]
      - action: add-comment@v1
        args:
          comment: |
            This PR affects one or more data model files which may impact data pipelines. Adding Data Team to reviewers.
  # Require QA on all PRs not including hotfixes
  qa_required_approval:
    if:
      - {{ not is.hotfix }} 
      - {{ is.qa_required }}
    run:
      - action: add-reviewers@v1
        args:
          reviewers: [<org-name>/qa]
      - action: require-reviewers@v1
        args:
          reviewers: [<org-name>/qa]
          also_assign: false
  # Add QA on all PRs (not required)
  qa_approval:
    if:
      - {{ not is.hotfix }} 
      - {{ not is.qa_required }}
    run:
      - action: add-reviewers@v1
        args:
          reviewers: [<org-name>/qa]
  # Require TL or QA approval on all hotfix PRs and label them
  hotfix_required_approval:
    if:
      - {{ is.hotfix }} 
    run:
      - action: add-reviewers@v1
        args:
          reviewers: [<org-name>/team-leads, <org-name>/qa]
      - action: require-reviewers@v1
        args:
          reviewers: [<org-name>/team-leads, <org-name>/qa]
          also_assign: false
      - action: add-label@v1
        args:
          label: "hotfix"
          color: "#d73a4a"
  platform_team_required_approval:
    if:
      - {{ files | match(list=pipeline_files) | some }}
    run:
      - action: require-reviewers@v1
        args:
          reviewers: [<org-name>/platform]
      - action: add-comment@v1
        args:
          comment: |
            This PR affects a deployment pipeline. Adding Platform Team as a required reviewer.

# The next function calculates the estimated time to review and makes it available in the automation above.
calc:
  etr: {{ branch | estimatedReviewTime }}
has:
  jira_ticket_in_title: {{ pr.title | includes(regex=r/\b[A-Za-z]+-\d+\b/) }}
  jira_ticket_in_desc: {{ pr.description | includes(regex=r/atlassian.net\/browse\/\w{1,}-\d{3,4}/) }}
is:
  formatting: {{ source.diff.files | isFormattingChange }}
  docs: {{ files | allDocs }}
  tests: {{ files | allTests }}
  hotfix: {{ branch.name | match(regex=r/hotfix.*/) }}
  qa_required: {{ repo.name | match(list=['express','api']) }}
pipeline_files:
  - .circleci/
  - .github/

Expected behavior

Ideally the action doesn't fail but if it does, the failure should be represented on the PR check also. As it stands it also doesn't seem to allow the PR author to re-run checks (that option doesn't do anything) so there is no way to re-trigger without a new commit.

Screenshots The action failing: image

The PR checks: image

Note that our rules should not let me merge (or bypass branch protections) on a gitstream failure but am given that option (I'm allowed to bypass the out of date branch GitHub check).

image

Additional context

It seems very similar to this issue which seems it may have been GitHub timeouts https://github.com/linear-b/gitstream/issues/317. That was noted as fixed though.

tom-moore commented 5 days ago

I noticed that in that error message above the 'Fetching the repository' step is failing looking for the main ref. That repo uses master as the default branch rather than main. The successful runs of the action are instead running

Fetching the repository
  /usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +refs/heads/master*:refs/remotes/origin/master* +refs/tags/master*:refs/tags/master*

I'm not sure what would cause the action to try and use the wrong branch reference within the same PR. It appears to correctly try to fetch master in most cases but sometimes main. We have some repos using main and some older ones master.

PavelLinearB commented 16 hours ago

Hi @tom-moore, thanks for reporting this issue. I am working with the team to fix this, and we need more troubleshooting information. Please privately share the PR link and the logs over email.

Thanks, pavel.vaks@linearb.io

tom-moore commented 14 hours ago

Thanks, done!