green-coding-solutions / eco-ci-energy-estimation

Eco CI Energy estimation for Github Actions Runner VMs
https://metrics.green-coding.io/ci-index.html
MIT License
55 stars 16 forks source link

Composite GHA error for Cache hit #3

Closed jreimone closed 1 year ago

jreimone commented 1 year ago

I tried your GHA and it fails when executing this step. Here is what I see in the log:

Run if [[  == 'true' ]]; then
  if [[  == 'true' ]]; then
    echo "Cache hit succeeded! πŸ˜€"
  else
    echo "Cache hit failed! ❌"
  fi
  shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
/home/runner/work/_temp/1aa3fbf8-bd06-[44](https://github.com/Staffbase/testio-management/actions/runs/4617917933/jobs/8164729482#step:3:51)9c-933a-ff[50](https://github.com/Staffbase/testio-management/actions/runs/4617917933/jobs/8164729482#step:3:58)8558acf7.sh: line 1: conditional binary operator expected
Error: Process completed with exit code 2.

As you can see the output ${{ steps.cache-pip.outputs.cache-hit }} is empty and wasn't populated by the previous step. In the end my workflow fails in the initialization step with:

jq: error (at <stdin>:4): Cannot iterate over null (null)
Error: Process completed with exit code 5.

This error is thrown in your step provided here.

All this happens in my very first step of initializing the measurement by:

steps:
      - name: Initialize Energy Estimation
        uses: green-coding-berlin/eco-ci-energy-estimation@main
        with:
          task: start-measurement

Can you help fixing it?

ArneTR commented 1 year ago

Hey @jreimone !

thanks for trying out our energy estimator. The bug is new and for debugging I would need to see the integration.

Is your repo and the workflow file public?

jreimone commented 1 year ago

No, it's not public. I tried to test it in an isolated way in a public repo. The run was successful. Will dig deeper

jreimone commented 1 year ago

Ok, I found out that the problem is twofold. See this PR for your reference. The run succeeds but the reported problem with the cache hit occurs there, too. Check this run. The logged error with exit code 2 is visible in the summary as well but the run itself succeeds. Can it be that my other PR run fails due to the fact that it is a private repo?

ArneTR commented 1 year ago

Thanks for putting the work in and isolating this!

  1. That the "cache-hit" output variable does not get set is really suprising. It might actually be a pecularity with the official github action that we are using: https://github.com/actions/cache According to their docs this variable should always exist.

  2. That you have issues running the action at all in private repos we have to investigate.

I am putting out dev on the task who has developed the action and we get back to you.

@dan-mm : I have assigned you here. Can you please have a look at this and prioritize it?

dan-mm commented 1 year ago

Hi!

The caching bug should be fixed now. It was an oddity with the github cache action -> turns out that cache-hit variable is not set if it doesn't find the cache, for example on a first run (the case you were running into) or renaming the caching key (how I reproduced it on my end).

This normally is not a problem if you then use the cache-hit variable during an "if' step, as they show in their documentation... but if you try to access the variable directly via bash during a step as we do, that's when you run into issues :-( So I rewrote the cache-hit informing steps to check for cache-hit in the "if" segment to bypass this issue.

As for the private-repo, there really shouldn't be a difference between using this in a public or private repo as far as I know. I'll investigate this now, but I hope it works now with the cache-hit change

dan-mm commented 1 year ago

Looking into this, there is an issue with running it in private-repos. Specifically we attempt to identify the workflow-id automatically so that the user does not need to pass it in. Turns out our mechanism for doing so only works on public repos. I'll look into this further...

dan-mm commented 1 year ago

Ok, I have figured out how to make this work with private repositories. So the API call we're using to get the workflow-id needs authentication for private repos - so we now use the ${ github.token } for that.

Unfortuantely by default for private repos, your GITHUB_TOKEN does not have read permissions for actions, so that needs to be set in the workflow:

`jobs: test: runs-on: ubuntu-latest permissions: actions: read steps:

This is now documented in the README. Alternatively if we do not make any api calls , we won't need the workflow-id, so I'm currently working on adding a flag to turn this functionality off entirely, if you do not want to set the permissions to read, or otherwise just don't want to send data. That will be coming soon.

jreimone commented 1 year ago

Makes sense to skip public reporting. In that case I would highly appreciate if I had the option to retrieve the data you normally send to the public reporting API as a JSON object. This way, I could send it to our data warehouse and store it there. Would that be possible?

jreimone commented 1 year ago

By the way, with your fix my run in the private repo succeeds because the actions have read access to the token. What I find interesting is that the Sonar analysis consumes about 4x energy as the execution of the tests and about 70% of the total consumed energy of the run 😳

ArneTR commented 1 year ago

That is really interesting, but actually something we see a lot.

For instance running pylint is extremely costly compared to running the python code itself often on our dev machine measurements :(

  1. We currently do not plan to support JSON object retrieval of the data as we try to nudge people to make energy data public to advance the domain of sustainable software transparency.

However if you feel that this is a critical needed feature we are happy to accept a Pull-Request :)

  1. Is your initial issue solved and the request can be closed?
jreimone commented 1 year ago

@ArneTR I completely understand you. We at Staffbase also work on means for making our resource consumptions more tangible publicly. Using your great tool here is only one small piece of the full story. Nevertheless we analyse our data internally, too. Thus, having the option to retrieve the measures in an interchangeable format reduces dependencies to your data visualization as an example. I also see another advantage: instead of directly appending the measures to the $GITHUB_STEP_SUMMARY you could store it as raw data in a variable. After the run anything could be generated from the variable's content, such as the summary of the run, or a push to a separate DWH, or whatever. Thanks for the invitation to file a PR. Will definitely consider it.

More important for this issue: it is closed and can be closed πŸ‘ Thanks @dan-mm and @ArneTR