Closed zowoq closed 1 year ago
Sorry, we didn't consider the concurrent jobs limit per account. I see the following options to tackle this problem:
@zowoq, how do you think, is the first option manageable?
is the first option manageable
No, I don't see that we can give one repo exclusive use of the actions for 6-8 hours daily, it isn't considerate to the other projects in the org. Personally I wouldn't say we could even agree to it happening once a week.
I'm happy to try to work something out here, whether it be improving the actions or maybe running them via a different CI system on our own infra but either way we can't have one repo consuming an excess of shared resources.
I'm can run the actions on my account, and then we can have an action on this repo that pulls from the one on my account. However, it seems a bit hacky...
I also don't think we can improve the GitHub action that much because the problem is the nature of the task, it's very huge.
Can you elaborate what the CI does just now? If it is just a single crawlwer process fetching stuff from the internet it could also become a service in one of the nix-community builder machines. Is there maybe some task that you have to re-do in the current CI setup that could be cached when it is a long-running process with a database?
@Mic92 , @zowoq here's the summary (with a little bit outdated naming) about the action.
I see several improvements:
nvfetch.py
, optimize reading the .toml
file. Not just a small part from the middle of that file for each block.generated.json
files to use them as a cache. When running nvfetch.py
with a new block, find the extensions from the previous generated.json
and merge them into a single .json
. This way, we will prevent nvfetcher from fetching the files that didn't change in the current block. generated.json
in another branch of the repo and upload at some time before the main action runs. This is to improve the flake size on master
.nvfetcher
so that it doesn't lose blocks (https://github.com/berberman/nvfetcher/issues/92)
.toml
into large chunks, one for each job. Each chunk is split into blocks. Each block is processed separately because nvfetcher sometimes fails and loses the whole progress. In this case, it only loses the progress for a block. Such blocks are collected to be retried later (but they're currently just discarded from the repo).Mhm, it seems to that using a simple key/value could help to not have to download and recompute hashes of already known releases could significantly save time. Is it not easier to just re-implement download part that nvfetcher does in python? Than error handling should also be easier and you don't need to commit generated toml files. I implemented some similar updater for vim plugins and it was actually quite fast in the end: https://github.com/NixOS/nixpkgs/blob/42bee50625c60896c62fc90805421a8b5d66a25b/maintainers/scripts/pluginupdate.py#L366
Github actions also have a cache feature where you could store a database to speed things up.
@Mic92 , thank you for the ideas! Will try
Hi, @zowoq . I rewrote the action in Haskell. It now takes a couple of minutes to complete - see https://github.com/deemp/nix-vscode-extensions/actions/runs/4048004025. Please, enable GH Actions in this repo
I've re-enabled actions.
Thanks!
Thank you for improving this, appreciated.
cc @nix-community/infra @deemp @AmeerTaweel
I've disabled github actions on this repo, just now it was using all of the organisations quota. (20 parallel jobs that had already been running for 3 hours)