Open sskras opened 1 year ago
I always disable linting because of this recently... Yeah, might be a good idea to make it opt in.
Excluding the msys2 dir from defender also helps a lot
I've created https://github.com/msys2/msys2-pacman/pull/18
Feedback welcome
@lazka, thanks for the efforts.
Another question goes about the actual parsing implementation.
@MehdiChinoune commented yesterday:
It is expected because It is a split package. msys2/MSYS2-packages#2593
As the original issue is already closed, I quote and comment it here:
@lazka commented on Jul 31, 2021 in msys2/MINGW-packages#2593:
not much we can do I think.. lots of forks..
I struggle at analyzing all the related Bash code at the moment. Could you please summarize:
I struggle at analyzing all the related Bash code at the moment. Could you please summarize:
* Why is such heavy forking needed to parse PKGFILEs ?
because it's fast on Linux, and pacman is primarily developed for Linux.
* Could it be rewritten in another language and so avoid forking more easily ?
Tricky since PKGBUILD files are written in bash. Someone would have to speed things up without complicating the code too much, and propose that to upstream.
I see some potential by evaluating functions once and then re-using that for further checks, but that's no small change.
I see some potential by evaluating functions once and then re-using that for further checks, but that's no small change.
Thanks. And I am yet to see the forking source (no pun intended:)
IOW I believe that eval
works without invoking fork()
. Eg.: What's the difference between eval and exec?
eval
will run the arguments as a command in the current shell. In other wordseval foo bar
is the same as justfoo bar
. But variables will be expanded before executing, so we can execute commands saved in shell variables:$ unset bar $ cmd="bar=foo" $ eval "$cmd" $ echo "$bar" foo
It will not create a child process, so the variable is set in the current shell. (Of course
eval /bin/ls
will create a child process, the same way a plain old/bin/ls
would.)
I guess I will need to dedicate some days into diving into the set of related Bash scripts anyways.
You can run bash -x /usr/bin/makepkg --printsrcinfo
to see what it's doing
just as an example, the following contains some parts of the code involved:
#!/bin/bash
for i in {1..1000}
do
while read -r; do
true
done < <(echo "")
done
This takes:
Thanks. I added some traces to the scripts and found this little guy called extract_function_variable()
invoking subshells:
Here it runs grep_function "$funcname" "$attr_regex"
like this:
grep_function "package_mingw-w64-x86_64-clang" "^[[:space:]]* checkdepends_any\+?=[^(]"
... which in turn does some parsing by executing declare -f
for the funcname
and piping it to grep
:
{ declare -f "package_mingw-w64-x86_64-clang" || declare -f package; } 2>/dev/null | grep -E "^[[:space:]]* checkdepends_any\+?=[^(]"
@lazka, do you know (why) is using subshell absolutely needed here?
Wouldn't the same parsing work by the means of eval
?
Thanks for fixing the title (you beat me to it:)
Maybe it's worth reporting to the upstream?
OK, it seems my thought was wrong. I tried to move the grep_function ...
upper and pipe it to the looped read
directly:
92 grep_function "$funcname" "$attr_regex" | \
93 while read -r; do
This (unexpectedly) increased running time by the 18%, it's 110s => 130s!
Before the change:
real 1m50.572s
user 0m20.097s
sys 0m45.337s
After the change:
real 2m10.283s
user 0m23.358s
sys 0m59.751s
I guess, the fork()
+exec()
combo occurs to launch grep
anyways, still incurring a big timing cost on Windows.
Sorry for the noise.
OK, I tried to use Bash ERE matching instead. The inspiration: https://stackoverflow.com/questions/17420994/how-can-i-match-a-string-with-a-regex-in-bash/51960723#51960723
For that I used operator =~
inside [[ ... ]]
and the BASH_REMATCH
array. The PoC:
This alone gave me ~30% or ~30s decrease (96 => 67) in running time of makepkg --printsrcinfo PKGBUILD
.
The potential issue with the PoC at the moment is that it doesn't handle the next matching lines of the same function.
Though this could be worked around by multiplying REGEX_BASH
into something like this in a loop:
${REGEX_BASH}.*${REGEX_BASH}.*${REGEX_BASH}.*${REGEX_BASH}<...>
Not sure if my PoC is worth of extending. The running time is still quite long: ~1m7s on my machine.
I wonder how long would it take the makepkg
to do the same using Linux (no installation at hand).
Description
I tried building newest msys2/MINGW-packages/mingw-w64-clang.
Every time I run
makepkg-mingw
, things afterBuilding mingw64...
and beforeMaking package: mingw-w64-clang 14.0.6-6
take ~90-120 seconds to complete:My machine has i5-6200U CPU running @ 2.30GHz:
Is this expected? And why?
Verification
Windows Version
Microsoft Windows [Version 10.0.19044.1889]
MINGW environments affected
Expected behavior
Tried tracing the scripts using
set -x
. Looks like it's doing a lot of Bash processing. Big part of which seems to be repetitive.Eg. I noticed the Bash function
lint_pkgbuild()
coming from pacman/scripts/libmakepkg and occurring at the start of my trace.I expect to know the root cause of this step taking so relatively long to complete. And whether is this expected.
Actual behavior
The initial step takes ~90-120 seconds to complete.
Repro steps
1. Download the files:
2. Go into the Clang dir:
3. Start
makepkg
& stop it just before the build: