Closed siddharth-krishna closed 3 weeks ago
@siddharth-krishna I’m stuck at step 3, I run this
time snakemake -call all --cores all --printshellcmds --configfile ~/code/solver-benchmark/benchmarks/pypsa/pypsa-eur-sec-2-lv1-3h.yaml ; echo -e '\a'
and it show error
Error in rule retrieve_worldbank_urban_population:
jobid: 41
output: data/worldbank/API_SP.URB.TOTL.IN.ZS_DS2_en_csv_v2_3403768.csv
I think our target is to reduce the time it takes to run CI, so we should address these issues first.
1. Write a script to follow the instructions here and automatically generate the LP files ready for benchmarking.
I encountered an issue during CI with the following error:
System.IO.IOException: No space left on device : '/home/runner/runners/2.319.1/_diag/Worker_20240923-082302-utc.log'
This occurs when running the following command in the GitHub Action:
time snakemake -call results/networks/elec_s_20_ec_lv1_3h_op.nc --cores all --printshellcmds --configfile ~/code/solver-benchmark/benchmarks/pypsa/pypsa-eur-elec-20-lv1-3h-op.yaml ; echo -e '\a'
I believe this happens because we've hit the storage limit. To resolve this, we could use self-hosted runners as described here: https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners
By running the GitHub Actions on our machine or server, we will have more control over storage and resources, which can prevent such issues from occurring due to GitHub's storage limits on their hosted runners.
2. Cache the PyPSA input data directories so that the CI job runs faster.
We have a lot of large files, so every time we run
time snakemake -call ...,
it needs to download these files, which increases the run time.
Note* I think we could create a small test file first. Running snakemake -call ... takes a long time because the file is very large, and rebuilding will be time-consuming.
I believe this happens because we've hit the storage limit. To resolve this, we could use self-hosted runners as described here:
Good idea, but I see the following text in that article:
We recommend that you only use self-hosted runners with private repositories. This is because forks of your public repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.
Another option might be to use larger runners, but I don't think we currently subscribe to the paid GitHub plans: https://docs.github.com/en/actions/using-github-hosted-runners/using-larger-runners/about-larger-runners
Could you please look into whether there's a way to disable a particular CI job/workflow from being run on forks / making it manually triggered and only by the repo admins?
@siddharth-krishna We can enable a workflow to be triggered manually by using the workflow_dispatch event. more information on how to do this in the GitHub documentation: Manually Running a Workflow.
Additionally, to control workflows running from forks, we can configure required approvals for workflows originating from public forks. More details can be found here: Configuring Required Approval for Workflows from Public Forks.
Thanks but can we limit workflow triggering to admins only?
I have an idea to move this forward for now: let's create a Dockerfile that clones pypsa-eur at a specific commit, installs pinned versions of the dependencies, patches pypsa-eur as required, runs the configs, and uploads the results to a GPC bucket. We can test this manually locally, and for now we can run it manually on a GCP VM when we want to update the benchmark sets.
Once we figure out the security of self hosted runners, we can run the same Dockerfile as a CI workflow on them. Does that make sense?
Thanks but can we limit workflow triggering to admins only?
Yes, we can add a condition check in the workflow to ensure only admins can trigger it.
steps:
- name: Check if user is admin
run: |
ADMINS="admin1 admin2 admin3" # List of admin GitHub usernames
if [[ " $ADMINS " =~ " $GITHUB_ACTOR " ]]; then
echo "User $GITHUB_ACTOR is an admin. Proceeding with the workflow..."
else
echo "User $GITHUB_ACTOR is not an admin. Exiting workflow."
exit 1
fi
I have an idea to move this forward for now: let's create a Dockerfile that clones pypsa-eur at a specific commit, installs pinned versions of the dependencies, patches pypsa-eur as required, runs the configs, and uploads the results to a GPC bucket. We can test this manually locally, and for now we can run it manually on a GCP VM when we want to update the benchmark sets.
Once we figure out the security of self hosted runners, we can run the same Dockerfile as a CI workflow on them. Does that make sense?
Yes it makes sense
I would be surprised if github doesn't have the option to allow only admins to trigger certain actions (although I can't find it anywhere explicitly in the docs or it may be under a paid plan).
As an alternative, I've read that some people are using protected branches for that. They create a special target branch for new PRs, that only admins can merge into and set up the CI that is running only on that branch. Then an admin can check the code before merging it into that branch and trigger the CI. If the CI passes it gets merged into main.
Closing this as completed by #48 and tracking further improvements in #53
Write a script to follow the instructions here and automatically generate the LP files ready for benchmarking.
Is there a limit on the amount of CI minutes for a GitHub public repo?
Some features that are good to have:
only_generate_problem_file
config flag (branch:open-energy-transition/linopy.git@only-generate-problem-files
) to take a file path instead of true/false, and use that path in linopy to dump the file.scripts/solve_network.py
needs to be patched as mentioned in the readme to support theonly_generate_problem_file
config flag. Sid will make an upstream PR with this soon, until then the CI needs to patch the python file.