rust-netlink / rtnetlink

Other
76 stars 45 forks source link

Manipulate network resources inside a network namespace #68

Open fisiognomico opened 1 month ago

fisiognomico commented 1 month ago

Hi everyone! I will briefly explain my issue: I need to configure the network interfaces inside a network namespace, in this example I will simply stick on how to set lo up. So my goal is to get the same outcome as ip -n test link set lo up. As NetworkNamespace does not offer a similar interface I tried to mimic both the functioning of unshare_processing and how iproute2 switches to a target namespace.

Using rtnetlink to set lo up I wrote this simple routine:

async fn set_lo_up() -> Result<(), Error> {
    let (connection, handle, _) = new_connection().unwrap();
    log::debug!("ARE WE STOPPING YET???");
    let veth_idx = handle.link().get().match_name("lo".to_string()).execute().try_next().await?
                .ok_or_else(|| log::error!("Can not find lo interface ")).unwrap()
                .header.index;
    log::debug!("LO INTERFACE INDEX: {}", veth_idx);
    handle.link().set(veth_idx).up().execute().await.unwrap();
    Ok(())
}

And the main logic of the application of how the switch namespace is implemented in this function:

async fn split_namespace(ns_name: &String) -> Result<(), ()> {
    // First create the network namespace
    NetworkNamespace::add(ns_name.to_string()).await.map_err(|e| {
        log::error!("Can not create namespace {}", e);
    }).unwrap();

    // Open NS path
    let ns_path = format!("{}{}", NETNS, ns_name);

    let mut open_flags = OFlag::empty();
    open_flags.insert(OFlag::O_RDONLY);
    open_flags.insert(OFlag::O_CLOEXEC);

    let fd = match open(Path::new(&ns_path), open_flags, Mode::empty()) {
        Ok(raw_fd) => unsafe { 
            File::from_raw_fd(raw_fd)
        }
        Err(e) => {
            log::error!("Can not open network namespace: {}", e);
            return Err(());
        }
    };
    // Switch to network namespace with CLONE_NEWNET
    if let Err(e) = setns(fd, CloneFlags::CLONE_NEWNET) {
        log::error!("Can not set namespace to target {}: {}", ns_name, e);
        return Err(());
    }
    // unshare with CLONE_NEWNS
    if let Err(e) = unshare(CloneFlags::CLONE_NEWNS) {
        log::error!("Can not unshare: {}", e);
        return Err(());
    }
    // mount blind the fs
    // let's avoid that any mount propagates to the parent process
    // mount_directory(None, &PathBuf::from("/"), vec![MsFlags::MS_REC, MsFlags::MS_PRIVATE])?;
    let mut mount_flags = MsFlags::empty();
    mount_flags.insert(MsFlags::MS_REC);
    mount_flags.insert(MsFlags::MS_PRIVATE);
    if let Err(e) = mount::<PathBuf, PathBuf, str, PathBuf>(None, &PathBuf::from("/"), None, mount_flags, None) {
        log::error!("Can not remount root directory");
        ()
    }

    // Now unmount /sys
    let sys_path = PathBuf::from("/sys");
    mount_flags = MsFlags::empty();
    // Needed to respect the trait for NixPath
    let ns_name_path = PathBuf::from(ns_name);

    // TODO do not exit for EINVAL error
    // unmount_path(&sys_path)?;
    // consider the case that a sysfs is not present
    let stat_sys = statvfs(&sys_path)
        .map_err(|e| {
            log::error!("Can not stat sys: {}", e);
    }).unwrap();
    if stat_sys.flags().contains(FsFlags::ST_RDONLY) {
        mount_flags.insert(MsFlags::MS_RDONLY);
    }

    // and remount a version of /sys that describes the network namespace
    if let Err(e) = mount::<PathBuf, PathBuf, str, PathBuf>(Some(&ns_name_path), &sys_path, Some("sysfs"), mount_flags, None) {
        log::error!("Can not remount /sys to namespace: {}", e);
        ()
    }

    set_lo_up().await.unwrap();

    Ok(())

What I would expect is that the lo interface inside the namespace switches to up and that the function returns normally, instead it hangs indefinitely during the interface index retrieval:

[2024-05-21T13:54:16Z DEBUG run_in_ns] Net configuration PID: 321881
[2024-05-21T13:54:16Z DEBUG run_in_ns] ARE WE STOPPING YET???
[2024-05-21T13:54:16Z DEBUG netlink_proto::handle] handle: forwarding new request to connection

Also if instead of calling the set_lo_up() function I simply execute ip link set lo up, for example using the std::process::Command crate, it works as expected. If anyone wants to debug this bug or get a better idea of it, I create a minimal repository that reproduces it.

I tried to look a bit using strace but besides some esoteric buffers returned from recv() I can not recognize anything that is much different from the behavior of iproute2, and unfortunately I do not have enough experience with the Netlink protocol to grasp much out of it. Anyway if anybody has some resources or tips on how to begin on how to better debug it from here I would surely invest some effort to propose an improvement to this project in the future!

Thanks.

fisiognomico commented 3 weeks ago

Quick update: I continued to investigate this issue on my own, and it seems that it is not related to netlink at all, as the same code which calls directly a netlink socket created with the NETLINK_ROUTE protocol works as expected. It seems that is more related to how tokio spawn thread using the clone3 syscall, which at one point they all seem to hang in a deadlock. Honestly this is just an assumption based on the output of strace, where every spawned thread is waiting in a futex. Given this premises it might be more appropriate to complete the NetworkNamespace interface without using tokio at all, as the code that is present at the moment is doing, and extend its capabilities to manipulate network resources. I will work on this on a side project, when I have a complete solution I might propose a PR, if you feel that this discussion is orthogonal to your project feel free to close this issue. Thanks for the patience!