Open ANogin opened 1 month ago
I did a timing test of a gather of 2x (create_root_resource_from_file
+ unpack_recursively(do_not_unpack=[Elf])
+ auto_run_recursively(all_identifiers=True)
) on TinyCore iso 14 and 15 respectively. Here are the results
bugfix/no_identifier_rerun 98.44s wall 65.83s CPU 142.9ms async select 32.47s blocked
bugfix/no_identifier_rerun 97.17s wall 64.74s CPU 146.2ms async select 32.29s blocked
bugfix/no_identifier_rerun 101.92s wall 68.11s CPU 151.5ms async select 33.66s blocked
bugfix/no_identifier_rerun 98.60s wall 66.23s CPU 148.5ms async select 32.22s blocked
bugfix/no_identifier_rerun 99.96s wall 66.26s CPU 134.2ms async select 33.56s blocked
master f3e583c 108.79s wall 76.34s CPU 146.6ms async select 32.31s blocked
master f3e583c 111.36s wall 77.90s CPU 151.1ms async select 33.31s blocked
master f3e583c 109.05s wall 76.93s CPU 146.9ms async select 31.98s blocked
master f3e583c 113.03s wall 78.98s CPU 144.8ms async select 33.91s blocked
master f3e583c 112.77s wall 79.39s CPU 130.9ms async select 33.25s blocked
where I used the approach from https://gist.github.com/vxgmichel/620eb3a02d97d3da9dacdc508a5d5321 to break the non-CPU time into “select” vs “blocked IO”.
- Can you fix merge conflicts given recent pushes to master
Done
- With Use GzipFile python unpacker for speed, fall back on pigz if needed #472 now on main, is the performance speedup still 10%?
No, it's ~35-40% now.
master 27.35s wall 26.99s CPU 124.0ms async select 244.7ms blocked
master 26.97s wall 26.55s CPU 125.3ms async select 293.7ms blocked
master 26.90s wall 26.52s CPU 120.5ms async select 265.8ms blocked
master 27.68s wall 27.29s CPU 123.3ms async select 267.2ms blocked
master 26.91s wall 26.51s CPU 131.1ms async select 267.9ms blocked
bugfix/no_identifier_rerun 19.84s wall 19.44s CPU 119.7ms async select 280.6ms blocked
bugfix/no_identifier_rerun 19.44s wall 18.99s CPU 125.5ms async select 327.8ms blocked
bugfix/no_identifier_rerun 19.43s wall 18.99s CPU 124.6ms async select 313.0ms blocked
bugfix/no_identifier_rerun 19.85s wall 19.41s CPU 125.1ms async select 317.7ms blocked
bugfix/no_identifier_rerun 19.66s wall 19.19s CPU 132.9ms async select 338.6ms blocked
One sentence summary of this PR (This should go in the CHANGELOG!) Avoid rerunning components unnecessarily; eliminate an already unlikely possibility that a component that should be rerun (because of difference in configuration) will not be rerun.
Link to Related Issue(s) N/A
Please describe the changes in your request. Previously, when executing a set of components (e.g. identifiers) on a resource, the job service will check whether the tags that the previous run has added means that more components should be executed, and then will run the full set again. This changes it so that only the additional components are executed, but those that were already executed once are not executed again. However if a component is run with a non-null config, it is always run, without regard for caching.
I observed speedups of up to 10% or more on some messy identify/unpack code.
Anyone you think should look at this, specifically? @whyitfor