Closed mattst88 closed 5 months ago
I know at least 2 places which aren't safe for parallel runs:
--commits
git checks (this one wasn't even safe to be run in parallel in the same pkgcheck
invocation, which required me to modify those checks into sequential)For point 2, I just prefer to not do anything - I think this isn't critical now. For point 1, maybe I'll try with some kind of file lock in the cache dir, similar to a mutex.
On another question, I'm not sure how multiple pkgcheck instances makes it faster? pkgcheck by default is parallel, and it should try to use all cores. Maybe you have somewhere config/default for --jobs
? Please try to pass --jobs ${N}
and see what it does?
The check itself looks simple, so the CPU intensive part is mainly the part of parsing and loading the package ebuilds. I think the current schedular of pkgcheck should handle it well?
In the process of trying to use GNU parallel I apparently lost my original script, so I don't know why it was so slow for me...
In any case, I've rewritten it:
#!/bin/bash
set -e
maint=${1}
shift
cd "$(git rev-parse --show-toplevel)"
mapfile -t pkgs < <(git grep -l "${maint}" '*/*/metadata.xml' | cut -d/ -f1-2)
mapfile -t redundant_ebuilds < <(pkgcheck scan -k RedundantVersion -R FormatReporter --format "{category}/{package}/{package}-{version}.ebuild" "${pkgs[@]}")
git rm "${redundant_ebuilds[@]}"
mapfile -t cleaned_pkgs < <(git diff-index --name-only HEAD | cut -d/ -f1-2 | sort -u)
pkgdev manifest "${cleaned_pkgs[@]}"
for pkg in "${cleaned_pkgs[@]}"; do
pushd "${pkg}" &> /dev/null
pkgcommit -s -m "Drop old versions" .
popd &> /dev/null
done
for pkg in "${cleaned_pkgs[@]}"; do
pushd "${pkg}" &> /dev/null
if [[ -d files ]]; then
printf "\nPlease check whether anything in files/ should be removed:\n\n"
ls -1 files
printf "\n'git rm' any unused files and run 'git commit -a --fixup \$(git last-commit-to .)'. Press CTRL+D when finished.\n"
$SHELL
fi
popd &> /dev/null
done
and it's plenty fast, so I'm going to close this. Thanks, and sorry for the noise.
I have a script that uses
pkgcheck
to find and remove redundant versions of packages, namedclean-redundant-versions.sh
It works, but it's not fast.
I tried speeding it up by using GNU parallel to find redundant versions in parallel:
When I run this, it generates tracebacks such as:
The tracebacks are generated from the
parallel "pkgcheck scan -k RedundantVersion ..."
command.Is it safe to run
pkgcheck
multiple times on the same repository at the same time? From the traceback it appears it's generating a cache—is it possible to generate this cache once up front before theparallel
invocation in order to avoid this problem?Other suggestions? Thanks!