matthew-brett / delocate

Find and copy needed dynamic libraries into python wheels
BSD 2-Clause "Simplified" License
262 stars 59 forks source link

Speed-up delocate on large wheels. #172

Closed colesbury closed 1 year ago

colesbury commented 1 year ago

Mach-O executables and libraries begin with a magic number. It's much faster to read the first few bytes of a file than to spawn otool.

For example, delocate list-deps currently takes about 12 minutes on the PyTorch macOS wheels (and nearly twice as long on GitHub's CI runners). With this change, delocate list-deps instead takes about 40 seconds.

codecov-commenter commented 1 year ago

Codecov Report

Merging #172 (9816d45) into master (089024f) will decrease coverage by 0.58%. The diff coverage is 87.50%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master     #172      +/-   ##
==========================================
- Coverage   95.47%   94.90%   -0.58%     
==========================================
  Files          14       14              
  Lines        1084     1100      +16     
==========================================
+ Hits         1035     1044       +9     
- Misses         49       56       +7     
Impacted Files Coverage Δ
delocate/tools.py 93.79% <87.50%> (-2.33%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

matthew-brett commented 1 year ago

Pending the CI - this looks good to me. As you point out, the check is just a screening test to reject implausible files - we still run otool on everthing that survives this screening test. Our only worry would be if there was some binary out there where the screen would reject it - where the beginning bytes are not in the listed patterns.

@HexDecimal - any further thoughts?

HexDecimal commented 1 year ago

any further thoughts?

With how MACHO_MAGIC is currently used it could be better to store it as a frozenset. This will speed up the header in MACHO_MAGIC check by a small amount.

Since the files are opened for only the first 4 bytes I'd wonder if file buffering should be disabled. I don't know the actual performance implications of doing this.

I can't do profiling so I don't know what's worth focusing on. Performance might be fine as-is.

matthew-brett commented 1 year ago

@HexDecimal - if you can't see a logical error here - let's merge as is - we may be able to improve the performance later, but it's already a huge speedup.

matthew-brett commented 1 year ago

OK - thanks for your patience - in it goes.