ni / nilrt

Tools to build NI Linux RT distribution.
MIT License
80 stars 69 forks source link

Length of time building packages #252

Closed Greg-Freeman closed 1 year ago

Greg-Freeman commented 1 year ago

How long should a normal build of the packages take? I was at this step:

after completing the build setup steps above...

bash ../scripts/pipelines/build.core-feeds.sh

It was just 50% through after an hour. Then my CPU was at 100% and suddenly Ubuntu logged me out and looks like it crashed. Obviously the crashing isn't normal but regardless something seems off with the building of packages taking that long. Will that have to happen every time?

amstewart commented 1 year ago

Yep. That's about right. The core feed alone is several hundred different package recipes - all being compiled from source.

The good news is that bitbake caches each recipe step, and tries not to rerun anything that it doesn't have do. The :downloads/ directory remembers all the recipe sources you've downloaded, and the :sstate-cache/ directory contains archives of all the steps' outputs. So when you run the build next, bitbake will attempt to restore all the content that it has previously completed, and become much faster.

If you just want to build a single package, you can specify that as your bitbake target, instead of running the whole core-feed build. That will at least only require you to run all the steps which are dependencies of your package's recipe.

amstewart commented 1 year ago

FYI; you're going to want to give the VM at least 200 GB of disk (500 GB is better), at least 8 GB of memory (16 is better), and literally all the processor cores you have.

OE builds are incredibly resource-hungry, but also incredibly efficient at using those resources.

Greg-Freeman commented 1 year ago

FYI; you're going to want to give the VM at least 200 GB of disk (500 GB is better), at least 8 GB of memory (16 is better), and literally all the processor cores you have. :open_mouth:

Right now I've got a dual boot set up so I'm running from an actual Ubuntu machine instead of VM. I have a 1TB SSD in this laptop, but I don't think I have 200 GB allocated to my ubuntu primary partition. I have 16 GB of ram so should be good there. Knowing all this I will probably just let it crank overnight the first time.

I do have my 150GB source code drive "D:" available on my ubuntu machine, so I am guessing if I reduce my ubuntu primary partition and expand D: I should be able to run off that moving forward.

Thanks for your help.

Greg-Freeman commented 1 year ago

I feel like this isn't normal. It was cruising up to this point but then my CPU dropped super low after being at 100% and it's been running overnight, stuck at 48%. Unsure what's going on. This is around the exact same spot it hung up before.

Setscene tasks: 6043 of 6043
Currently 23 running tasks (6024 of 12388)  48% |##############                |
0: rust-llvm-native-1.59.0-r0 do_compile - 9h51m21s (pid 1345573)  33% |##      |
1: linux-nilrt-nohz-5.15+gitAUTOINC+e49d91de3b-r0 do_compile - 9h38m45s (pid 1558772)
2: linux-nilrt-6.1+gitAUTOINC+3494faaf50-r0 do_compile - 9h38m42s (pid 1559618)
3: gtk+3-native-3.24.34-r0 do_compile - 9h31m13s (pid 1746851)
4: apache2-native-2.4.57-r0 do_install - 9h24m19s (pid 1843796)
5: libtool-cross-2.4.7-r0 do_configure - 9h23m54s (pid 1850412)
6: zstd-1.5.2-r0 do_compile - 9h23m53s (pid 1850683)
7: openssl-3.0.9-r0 do_configure - 9h23m37s (pid 1854971)
8: libtool-2.4.7-r0 do_configure - 9h23m36s (pid 1855139)
9: ni-grpc-device-1.1.0-r0 do_compile - 9h23m32s (pid 1856581)  56% |######     |
10: abseil-cpp-20211102.0+gitAUTOINC+7c6608d0db-r0 do_compile - 9h23m15s (pid 1860721)  55% ||
11: googletest-1.11.0+gitAUTOINC+9e71237221-r0 do_compile - 9h23m11s (pid 1861282)  25% ||
12: gcc-runtime-11.3.0-r0 do_package_write_ipk - 9h22m56s (pid 1863512)
13: glibc-locale-2.35-r0 do_package - 9h22m22s (pid 1877005)
14: glibc-locale-2.35-r0 do_populate_sysroot - 9h21m35s (pid 1901305)
15: libsepol-3.3-r0 do_package - 9h21m23s (pid 1906938)
16: libsepol-3.3-r0 do_populate_sysroot - 9h21m18s (pid 1908850)
17: fribidi-1.0.13-r0 do_package - 9h21m5s (pid 1916578)
18: expat-2.5.0-r0 do_package - 9h21m4s (pid 1918025)
19: glibc-locale-tests-1.0-r0 do_packagedata - 9h21m1s (pid 1919121)
20: libjpeg-turbo-1_2.1.5.1-r0 do_package - 9h21m1s (pid 1919601)
21: ni-test-boot-time-1.0-r0 do_create_spdx - 9h21m0s (pid 1919705)
22: glibc-tests-1.0-r0 do_packagedata - 9h21m0s (pid 1920115)
amstewart commented 1 year ago

Yeah; that's not normal. rust-llvm-native, and the linux-nilrt kernel recipes all take a relatively long time to build; but nothing near 9 hours. A couple things to try:

  1. Try increasing the system max number of concurrent open file pointers and inotify watch limit to something arbitrarily high. They default to low values for historic reasons, and a lot of programs don't have a graceful way to handle hitting the limit.
  2. Try setting the BB_NUMBER_THREADS environment variable to the number of physical cores on the system (or slightly lower). We set it to 2x by default. Many years ago, bitbake would get stuck sometimes due to concurrency issues. I haven't seen it be an issue for >5 years now, but it might unstick something.
amstewart commented 1 year ago

Based on your description of the system, I would expect the core feed build to take about 90 minutes on the first build. Maybe as long as 2.5 hours, if you get particularly unlucky.

Greg-Freeman commented 1 year ago

Admittedly I didn't feel like being overly methodical and changing the suggestions you made one at a time, so I changed them all. I can't say what fixed it for sure, but something did and allowed it to build. My gut feel is that it was the BB_NUMBER_THREADS.

I saw my memory spike to 16GB for some reason, computer became unresponsive, and then the bitbake process crashed. I realized I didn't have a swap file, so I went ahead and set that up. However, when restarting, it picked up where it left off which was good.

One thing I did notice is I went overkill and reduced the number of threads to 7. If I rerun and bump that to 10, I start getting hash errors. So I switched it back to 7 and it was fine. I'm sure it's caching something I could remove that causes that hash to be wrong when changing the number of threads, but it's not really worth my time to figure it out right now.

amstewart commented 1 year ago

Hmm.. I'd be more inclined to blame the memory quota in that case. It's probably just that running such a large number of concurrent recipe build tasks happens to use 16 GB, and triggers the OOM killer. Maybe my 16GB recommendation is a little out of date. But it sounds like you have it building now at least.

One thing I did notice is I went overkill and reduced the number of threads to 7. If I rerun and bump that to 10, I start getting hash errors. So I switched it back to 7 and it was fine.

That's a little odd. After you get what you want out of the build, it might be worthwhile to clear the sstate cache (just remove the sstate-cache/ directory) and rebuild with the lower BB_NUMBER_THREADS. It's possible that getting OOM-killed has put some task hashes into a weird state.

I'll go ahead and close this issue, since it sounds like the original problem has been solved.