hashgraph / solo

An opinionated CLI tool to deploy and manage standalone test networks.
Apache License 2.0
3 stars 4 forks source link

Add GitHub runner telemetry to aid in diagnosing concurrent E2E test failures #309

Open jeromy-cannon opened 4 months ago

jeromy-cannon commented 4 months ago

Possible Solutions

  1. workflow-telemetry-action
  2. Argo Prometheus Metrics
  3. Enabling GitHub ARC Metrics

1. workflow-temetry-action

It was discovered that there were two issues with this solution:

2. Argo Prometheus Metrics

This might help us with some metrics such as the following:

3. Enabling GitHub ARC Metrics

Some metrics, however, no CPU or RAM utilization

NOTES:

JeffreyDallas commented 3 months ago

Experimented with workflow-telemetry-action but getting this error

/runner/_work/_actions/catchpoint/workflow-telemetry-action/v1.8.4/dist/webpack:/workflow-telemetry-action/node_modules/@octokit/auth-action/dist-node/index.js:19
    throw new Error("[@octokit/auth-action] The token variable is specified more than once. Use either `with.token`, `with.GITHUB_TOKEN`, or `env.GITHUB_TOKEN`. See https://github.com/octokit/auth-action.js#createactionauth");
^
Error: [@octokit/auth-action] The token variable is specified more than once. Use either `with.token`, `with.GITHUB_TOKEN`, or `env.GITHUB_TOKEN`. See https://github.com/octokit/auth-action.js#createactionauth

Have tried different release versions, all get the same error

Workflow logs

https://github.com/hashgraph/solo/actions/runs/9164986979/job/25197553053

jeromy-cannon commented 3 months ago

@JeffreyDallas , looks like it is complaining about us having more than one token that it can use, we probably need to pick one and pass it in. In the settings we have multiple that looks similar. We should probably specify to avoid confusing it.

I think since this is a re-usable workflow, that we will need to add a secret at the top, and then pass it in from where it is called. From the caller we would pass in GITHUB_TOKEN.

search for snyk-token and you can see the pattern used there.

JeffreyDallas commented 3 months ago

This workflow failed with Unable to get current workflow job info. Please sure that your workflow have "actions:read" permission! each actins:read already defined https://github.com/hashgraph/solo/actions/runs/9179531736/job/25241904073

As noticed by Jeromy, looks like there are two show stoppers: No support for reusable workflows: https://github.com/catchpoint/workflow-telemetry-action/issues/67 Self-hosted GH Runners require elevated security: https://github.com/catchpoint/workflow-telemetry-action/issues/44