nvidia-holoscan / holohub

Central repository for applications and operators for Holoscan
Apache License 2.0
102 stars 59 forks source link

`network_radar_pipeline` build failure #381

Closed JohnMoon-VTS closed 2 weeks ago

JohnMoon-VTS commented 2 months ago

I'm trying to build the network_radar_pipeline application and I'm running into some trouble.

I am using an Ubuntu 22.04 host system with Docker version 24.0.7.

The application has its own Dockerfile (applications/network_radar_pipeline/cpp/Dockerfile), but the build is failing on main.

I'm trying to build with:

./dev_container build --docker_file ./applications/network_radar_pipeline/cpp/Dockerfile

This command returns with exit code 0 even though under the hood it calls ./run build network_radar_pipeline and the build fails.

If I enter the container and run the build, I see this build failure:

CMake Error at build/network_radar_pipeline/_deps/matx-src/CMakeLists.txt:1 (cmake_minimum_required):
  CMake 3.23.1 or higher is required.  You are running version 3.22.2

Adding an apt-get upgrade -y to the Dockerfile resolves this error, but the build fails further down the line with:

-- Checking for module 'doca'
--   No package 'doca' found
CMake Error at /usr/share/cmake-3.29/Modules/FindPkgConfig.cmake:634 (message):
  The following required packages were not found:

   - doca

So, it appears DOCA was added as a build dependency at some point. If I try to bring in DOCA using similar syntax from operators/advanced_network/Dockerfile, but grabbing the 20.4 package, dpkg runs into an issue building DKMS modules:

106.8 Building initial module for 5.4.0-186-generic
123.3 Error! Bad return status for module build on kernel: 5.4.0-186-generic (x86_64)
123.3 Consult /var/lib/dkms/mlnx-ofed-kernel/24.04.OFED.24.04.0.6.6.1/build/make.log for more information.
123.3 dpkg: error processing package mlnx-ofed-kernel-dkms (--configure):
123.3  installed mlnx-ofed-kernel-dkms package post-installation script subprocess returned error exit status 10
123.3 dpkg: dependency problems prevent configuration of doca-all:
123.3  doca-all depends on doca-ofed (= 2.7.0-0.1.3); however:
123.3   Package doca-ofed is not configured yet.
123.3  doca-all depends on doca-runtime (= 2.7.0-0.1.3); however:
123.3   Package doca-runtime is not configured yet.
<--- snip --->
130.5 Errors were encountered while processing:
130.5  kernel-mft-dkms
130.5  knem-dkms
130.5  doca-ofed
130.5  doca-runtime
130.5  mlnx-ofed-kernel-dkms
130.5  doca-all
130.5  isert-dkms
130.5  iser-dkms
130.5  srp-dkms

I suspected this is because I'm on a host system running Ubuntu 22.04 (kernel 6.2.0) and the container image is currently using Ubuntu 20.04 (trying to target kernel 5.4.0). If I specify the 22.04 DOCA package to match my kernel, I get other apt errors (likely because the container is based on 20.04 which wouldn't have the same package versions available):

27.55 The following packages have unmet dependencies:
27.66  doca-all : Depends: doca-ofed (= 2.7.0-0.1.3) but it is not going to be installed
27.66             Depends: doca-runtime (= 2.7.0-0.1.3) but it is not going to be installed
27.66             Depends: doca-devel (= 2.7.0-0.1.3) but it is not going to be installed

So, I went down the road of upgrading the network_radar_pipeline Dockerfile to use a newer holoscan base image (holoscan:v2.0.0-dgpu, based on Ubuntu 22.04). This seems to resolve the issues with the DOCA dependency, but now I'm getting errors in the network connectors such as:

/usr/include/linux/ip.h(87): error: invalid redeclaration of type name "iphdr" (declared at line 44 of /usr/include/netinet/ip.h)
  struct iphdr {
         ^

/workspace/holohub/applications/network_radar_pipeline/cpp/advanced_network_connectors/adv_networking_rx.cu(253): error: identifier "adv_net_free_all_burst_pkts_and_burst" is undefined
          adv_net_free_all_burst_pkts_and_burst(first.msg[m]);

It's not the only API running into an issue, but just looking at adv_net_free_all_burst_pkts_and_burst, it looks like the API was renamed to adv_net_free_all_pkts_and_burst in PR #347.

I think I can proceed from here, but I do have a few questions:

  1. It seems like the host OS needs to match the container OS (at least in major version) for kernel module builds to work correctly (makes sense). Should this be called out in the Platform Notes in the Container Build documentation?
  2. Should the invocations of CMake in the run script maybe be wrapped in set -o errexit?
  3. Is it assumed that updates to operators like ANO should be tested for build compatibility with any in-tree users of an operator? In other words, should it be allowed for ANO to remove an API that network_radar_application is using without also updating network_radar_application with the replacement?
  4. Assuming the build issues are worked out and I get network_radar_pipeline building against the new ANO, would you like my patch that upgrades the app to use the latest holoscan and ANO updates? I'm not sure if the app is specifically maintaining compatibility for Ubuntu 20.04 for some reason.

@dylan-eustice, tagging you here for your SA as I see you appear to be the main dev on this app.

Thanks!

JohnMoon-VTS commented 2 months ago

It looks like @e-ago has a PR to address the final build issues in #353. I'll test out those changes! Though, it doesn't address the update to 22.04, so please LMK if there's interest in that update.

e-ago commented 2 months ago

I should be able to finalize the PR and merge it soon. If you test it in the meanwhile, please share your feedback

JohnMoon-VTS commented 2 weeks ago

The network_radar_pipeline build works now when using the image from operators/advanced_network/Dockerfile. Building from applications/network_radar_pipeline/Dockerfile still fails, but that appears to just be a dependency issue. I'll close this issue for now.