Closed nleanba closed 7 months ago
Local cloning & du -sh
tells me that (treatments-xml) is 16G when shallow-cloned (--depth=2) and ~19G with full history.
That rules out caching the repos
This issue is obsolete with gg2rdf running on our servers and not as a github action anymore
(See https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows for documentation)
Currently, most of the time is spent fetching, cloning and pushing the two repositories, with a usually very short actual transformation in between
If we cache the repos, and only pull to get the current state, without needing to clone every time a single file changed, we could probably speed the transformations up by 5 to 10 minutes I think.
However, a repository may only have up to 10 GB cached data at a time. I'm not entirely sure if this disallows caching the repos — how big are they?
If size is a constraint, it might be sensible to cache only one of them (still a speed-up) or, if the issue is the git history, ensuring to remove the history (locally) after pulling or something before updating the cache
If all else fails, at the very least maybe cache the dependencies (apt packages and deno)?
@retog opinions?