serpent-os / boulder-d-legacy

Replaced by Rust tooling
https://serpentos.com
21 stars 7 forks source link

"boulder new" takes long to scan huge tarballs #6

Open livingsilver94 opened 1 year ago

livingsilver94 commented 1 year ago

Consider a Rust tarball. It's a huge archive that contains multiple projects (rustc, cargo, other Rust-related tools, llvm, vendored libraries...). This alone is source of a very long scan step, plus the stone.yml outcome is probably going to be garbage due to the fact multiple projects are vendored in the tarball, whilst we are only interested in rust dependencies and build steps.

Implementing heuristics to circumvent this situation is probably hard; a relatively easy workaround is to hit CTRL+C to abort the scan and let boulder finish without pre-filling stone.yml.

Example log (partial):

[22:03:44] INFO      Beginning download
[22:03:57] INFO      Downloaded: /tmp/boulderDrafterURI-t6AsPh
[22:03:57] INFO      Extracting: /tmp/boulderDrafterURI-t6AsPh
[22:03:57] INFO      Computing hash for /tmp/boulderDrafterURI-t6AsPh
[22:04:04] INFO      Scanning sources under /tmp/boulderDrafterExtraction.GpxvBk
[22:04:09] INFO      Analysing source trees
[22:04:44] WARNING   Unknown license for: rustc-1.63.0-src/src/tools/cargo/LICENSE-THIRD-PARTY
[22:05:36] WARNING   Unknown license for: rustc-1.63.0-src/src/llvm-project/lldb/third_party/Python/module/pexpect-4.6/LICENSE
[22:05:38] WARNING   Unknown license for: rustc-1.63.0-src/src/llvm-project/clang-tools-extra/clang-tidy/cert/LICENSE.TXT
[22:06:21] WARNING   Unknown license for: rustc-1.63.0-src/vendor/jemalloc-sys/jemalloc/COPYING
[22:09:35] WARNING   Unknown license for: rustc-1.63.0-src/src/tools/rust-analyzer/editors/code/LICENSE
[22:10:39] WARNING   Unknown license for: rustc-1.63.0-src/library/stdarch/crates/intrinsic-test/acle/LICENSE
[22:11:01] WARNING   Unknown license for: rustc-1.63.0-src/library/stdarch/crates/intrinsic-test/acle/mve_intrinsics/LICENSE
[22:12:00] WARNING   Unknown license for: rustc-1.63.0-src/library/stdarch/crates/intrinsic-test/acle/neon_intrinsics/LICENSE
[22:12:23] WARNING   Unknown license for: rustc-1.63.0-src/library/stdarch/crates/intrinsic-test/acle/main/LICENSE
[22:12:45] WARNING   Unknown license for: rustc-1.63.0-src/library/stdarch/crates/intrinsic-test/acle/morello/LICENSE
[22:13:04] WARNING   Unknown license for: rustc-1.63.0-src/src/doc/rustc-dev-guide/src/licenses.md
livingsilver94 commented 1 year ago

The license scaninng algorithm isn't super efficient, however since we're in pre-alpha stage we're building tooling in debug mode. In truth, building in release mode improves scanning time by a factor of 10.