oesse / python-criticalpath

Find critical path in a Yocto build (written Python)
MIT License
5 stars 0 forks source link

Missing obvious tasks #1

Open rossburton opened 4 years ago

rossburton commented 4 years ago

I just did a build of core-image-sato for intel-corei7-64 which involves building ovmf to build a wic image.

ovmf:do_compile literally takes eight minutes to build, during which everything else sits there until that is done, then core-image-sato:do_image_complete can actually run.

Yet this task is not in the critical path.

Attached is a tarball of buildstats and depends.

stats.tar.gz

oesse commented 4 years ago

I think this is due to the mapping between buildstats folder and package name. The folder is named ovmf-edk2-stable201905-r0 where the package is named ovmf. Am I right to assume that -edk2... is the version identifier equivalent to for example gcc's (gcc-runtime-9.2.0-r0) would be -9.2.0...?

Currently this mapping is done by a regex: re.match(r'(.*)-([0-9]|git).*', directory).

Is this a closed set of version rules? I am not so familiar how the directory names are constructed from PV and the likes.

I created PR #2 to fix this issue, however, the ovmf.do_compile still does not show up on the critical path. I added CLI options to calculate critical paths from a specific node to a specific node for debugging. For example:

$ python3 -m criticalpath -t "ovmf.do_compile" issue01/task-depends.dot issue01/buildstats
core-image-sato.do_build 0
core-image-sato.do_rm_work 0.28
core-image-sato.do_populate_lic_deploy 0.03
core-image-sato.do_image_complete 7.52
ovmf.do_populate_sysroot 0.7
ovmf.do_install 0.42
ovmf.do_compile 585.36

and

$ python3 -m criticalpath -s "ovmf.do_compile" issue01/task-depends.dot issue01/buildstats
ovmf.do_compile 585.36
ovmf.do_configure 0.31
ovmf.do_prepare_recipe_sysroot 0.56
gcc-runtime.do_populate_sysroot 1.67
gcc-runtime.do_install 6.24
gcc-runtime.do_compile 45.59
gcc-runtime.do_configure 51.25
gcc-runtime.do_prepare_recipe_sysroot 0.22
libgcc.do_populate_sysroot 0.61
libgcc.do_extra_symlinks 0.01
libgcc.do_multilib_install 0.01
libgcc.do_install 0.61
libgcc.do_compile 5.42
libgcc.do_configure 3.91
libgcc.do_prepare_recipe_sysroot 0.12
glibc.do_populate_sysroot 1.5
glibc.do_stash_locale 2.53
glibc.do_install 65.52
glibc.do_compile 77.08
glibc.do_configure 4.09
glibc.do_prepare_recipe_sysroot 0.11
libgcc-initial.do_populate_sysroot 0.52
libgcc-initial.do_extra_symlinks 0.01
libgcc-initial.do_multilib_install 0.01
libgcc-initial.do_install 0.38
libgcc-initial.do_compile 0.92
libgcc-initial.do_configure 4.58
libgcc-initial.do_prepare_recipe_sysroot 0.11
gcc-cross-x86_64.do_populate_sysroot 0
gcc-cross-x86_64.do_install 0
gcc-cross-x86_64.do_compile 0
gcc-cross-x86_64.do_configure 0
gcc-cross-x86_64.do_prepare_recipe_sysroot 0
linux-libc-headers.do_populate_sysroot 0.73
linux-libc-headers.do_install 8.82
linux-libc-headers.do_compile 0
linux-libc-headers.do_configure 0
linux-libc-headers.do_prepare_recipe_sysroot 0.05
pseudo-native.do_populate_sysroot 1.49
pseudo-native.do_install 0.17
pseudo-native.do_compile 3.92
pseudo-native.do_configure 0.07
pseudo-native.do_prepare_recipe_sysroot 0.08
sqlite3-native.do_populate_sysroot 1.33
sqlite3-native.do_install 0.31
sqlite3-native.do_compile 27.75
sqlite3-native.do_configure 8.56
sqlite3-native.do_patch 0.08
sqlite3-native.do_unpack 0.13
sqlite3-native.do_fetch 0.02

Other than manually looking at the graph I do not know how to debug this issue. Could you maybe reduce the input to a minimal set that exhibits the issue you describe?

rossburton commented 4 years ago

On 12/11/2019 08:55, Oliver wrote:

I think this is due to the mapping between buildstats folder and package name. The folder is named |ovmf-edk2-stable201905-r0| where the package is named |ovmf|. Am I right to assume that |-edk2...| is the version identifier equivalent to for example gcc's (|gcc-runtime-9.2.0-r0|) would be |-9.2.0...|? Currently this mapping is done by a regex: re.match(r'(.)- ([0-9]|git).', directory).

Correct, the recipe is ovmf_git.bb but that contains PV="edk2-stable201905".

Not sure why it's not just called edk2, but whatever.

Mapping from PN-PV to PN is not trivial and your regex will fail to match several recipes. The proper solution would be to mine the pkgdata to map from package name to recipe name, but a hackier solution might be to just split on hyphen and try matching less and less:

ovmf-edk2-stable201905-r0 <-- no matches ovmf-edk2-stable201905 <-- no matches ovmf-edk2 <-- no matches ovmf <-- found it!

Ross

oesse commented 4 years ago

Thanks for the hint how to match package name and buildstats directory name. I implemented this in #2 now and it appears to work.

However, even with these changes I still do not get ovmf on the critical path. In fact, calculating the critical path to ovmf and from ovmf to the root of the graph gives me a shorter distance than what the tool reports as the critical path. Here is what I did to calculate the lengths (on the branch with the PR):

python -m criticalpath -t "ovmf.do_compile" issue01/task-depends.dot issue01/buildstats > ovmf-target.dat
python -m criticalpath -s "ovmf.do_compile" issue01/task-depends.dot issue01/buildstats > ovmf-source.dat
python -m criticalpath issue01/task-depends.dot issue01/buildstats > all.dat

# remove last line to not duplicate 'ovmf.do_compile'
cat ovmf-source.dat ovmf-target.dat | head -n-1 | cut -d' ' -f2 | paste -s -d+ | bc
# => 866.47
cat all.dat | cut -d' ' -f2 | paste -s -d+ | bc
# => 1835.68

I also checked that all the buildstats files that are there, are actually parsed using:

from criticalpath import load_graph_from_dot, BuildstatsCache
depfile = 'issue01/task-depends.dot'
stats_dirs = ['issue01/buildstats']

def load_graph(filename):
    with open(filename, 'r') as dot_file:
        return load_graph_from_dot(dot_file)

graph = load_graph(depfile)
cache = BuildstatsCache(stats_dirs)
weights = {node: cache.load_elapsed_time(node) for node in list(graph)}
print(len([node for node, w in weights.items() if w != 0]))
# => 4554

compared to the files in the buildstats (minus all "setscene" tasks that are not in the task-depends.dot graph):

ls issue01/buildstats/*/do_* | grep -v setscene | wc -l
# => 4554

On the PR #2 I also added a --missing flag to print for which steps no buildstats could be picked up. Could you please check if some long running task is missing in the buildstats folder?

oesse commented 4 years ago

I have another idea of why your observed bottleneck was not reported on the critical path. I basically just look at the weighted dependency graph but do not consider in the critical path computation the concept of scheduling. That means, the critical path the tool reports assumes that you have infinite parallel processing units. It might be that when scheduling the tasks on a finite number of processors the critical path changes due to tasks that have all dependencies already satisfied but there is no free processor to start it yet.

Taking this into consideration will complicate things a bit and the results will vary depending on how many tasks are assumed to be possible at the same time.

What was the degree of concurrency in this build?

edit: also for a visualization of the reported and the critical path through ovmf, see https://gist.github.com/oesse/0d0a2a37b1ec5b0e2c5c03aee82430a4#file-criticalpath_issue1_path_comparisons-svg

rossburton commented 4 years ago

On 13/11/2019 21:20, Oliver wrote:

What was the degree of concurrency in this build?

BB_NUMBER_THREADS is 20.