rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.44k stars 12.73k forks source link

Accepting Incoming TCP Connections fails on Android #82400

Closed wngr closed 3 years ago

wngr commented 3 years ago

Starting with Android Oreo (8), Android started using a seccomp based filter approach to syscalls, explicitly allowing syscalls, see https://android-developers.googleblog.com/2017/07/seccomp-filter-in-android-o.html. https://android.googlesource.com/platform/bionic.git/+/master/libc/SYSCALLS.TXT enumerates the allowed syscalls. This means, that dispatching a generic syscall mechanism as introduced with this PR #78572 will result in a panic.

On top of that, I found that older versions of Android, such as Android 6, will return Function not implemented (os error 38) for this syscall. My tests showed that this only happens on x86, although I can't explain why.

As there's no way to detect the target android API level, the safest way would probably be to always use the accept syscall, and remove the special handling of the accept4 syscall. This could also be just guarded for x86 -- but would be happy to get an explanation on why it's only on that architecture (see https://github.com/tokio-rs/mio/pull/1446).

I have tested this with both real devices as well as Android emulators.

Code

I tried this code:

use std::net::TcpListener;

let listener = TcpListener::bind("0.0.0.0:8080").unwrap();
match listener.accept() {
    Ok((_socket, addr)) => println!("new client: {:?}", addr),
    Err(e) => println!("couldn't get client: {:?}", e),
}

and poked it with telnet <android_host> 8080.

I expected to see this happen: accept returns Ok(_)

Instead, this happened:

* Android < 8.0: strace output

[pid 10918] syscall_364(0x34, 0x9d5c9cf8, 0x9d5c9ca0, 0x80800, 0x9fda9dc8, 0x9fda9dc8 <unfinished ...> [pid 10918] <... syscall_364 resumed> ) = -1 (errno 38)

Which translate to _Function not implemented_.

### Version it worked on

<!--
Provide the most recent version this worked on, for example:

It most recently worked on: Rust 1.47
-->

It most recently worked on: Rust 1.48

### Version with regression

<!--
Provide the version you are using that has the regression.
-->

`rustc --version --verbose`:

rustc --version --verbose rustc 1.49.0 (e1884a8e3 2020-12-29) binary: rustc commit-hash: e1884a8e3c3e813aada8254edfa120e85bf5ffca commit-date: 2020-12-29 host: x86_64-unknown-linux-gnu release: 1.49.0


<!--
Did the compiler crash? If so, please provide a backtrace.
-->

### Backtrace
<!--
Include a backtrace in the code block by setting `RUST_BACKTRACE=1` in your
environment. E.g. `RUST_BACKTRACE=1 cargo build`.
-->
<details><summary>Backtrace</summary>
<p>
```

nagisa commented 3 years ago

Is the x86 thing an emulator? From the strace output it would seem that an entirely wrong syscall is being invoked.

Can this issue reproduce if you call accept4 from C? If so, consider reporting a bug against Android as well.

wngr commented 3 years ago

Is the x86 thing an emulator?

Happens both on an Emulator and a real x86 device (Zebra ET50).

I will try to whip up a minimal reproducer using the socket4 API.

Edit:

use std::mem;
use std::net::SocketAddr;

fn main() {
    unsafe {
        let bind_to: SocketAddr = "0.0.0.0:8080".parse().unwrap();
        println!("Trying to bind to {}", bind_to);
        let sock_addr =
            nix::sys::socket::SockAddr::new_inet(nix::sys::socket::InetAddr::from_std(&bind_to));

        let fd = libc::socket(libc::AF_INET, libc::SOCK_STREAM | libc::SOCK_CLOEXEC, 0);
        let (addrp, len) = sock_addr.as_ffi_pair();
        assert_eq!(libc::bind(fd, addrp, len as _), 0);
        assert_eq!(libc::listen(fd, 128), 0);

        let mut storage: libc::sockaddr_storage = mem::zeroed();
        let mut len = mem::size_of_val(&storage) as libc::socklen_t;

        if libc::accept4(
            fd,
            &mut storage as *mut _ as *mut _,
            &mut len,
            libc::SOCK_CLOEXEC,
        ) < 0
        {
            println!("error on accept: {}", nix::errno::errno())
        } else {
            println!("read {} from socket", len);
        }
    }
}

This works fine on my linux machine, but running it on an Android API 23 x86 emulator yields:

root@generic_x86:/data/local # strace ./android-socket 
[..]
write(1, "Trying to bind to 0.0.0.0:8080\n", 31Trying to bind to 0.0.0.0:8080
) = 31
socket(PF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
bind(3, {sa_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(3, 128)                          = 0
syscall_364(0x3, 0xbf85b6c0, 0xbf85b698, 0x80000, 0x80, 0x10) = -1 (errno 38)
write(1, "error on accept: 38\n", 20error on accept: 38
)   = 20
futex(0xb771722c, FUTEX_WAKE_PRIVATE, 2147483647) = 0
mprotect(0xb771a000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0xb771a000, 4096, PROT_READ)   = 0
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
futex(0xb770f544, FUTEX_WAKE_PRIVATE, 2147483647) = 0
mprotect(0xb771a000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0xb771a000, 4096, PROT_READ)   = 0
mprotect(0xb771a000, 4096, PROT_READ|PROT_WRITE) = 0
mprotect(0xb771a000, 4096, PROT_READ)   = 0
munmap(0xb771a000, 4096)                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

This also fails when using the generic syscall with errno 38:

        libc::syscall(
            libc::SYS_accept4,
            fd,
            &mut storage as *mut _ as *mut _,
            &mut len,
            libc::SOCK_CLOEXEC,
         )

Seems my guess on why that started happening is wrong; now I wonder why that ever worked .. :confused:

Just for reference:

cross build --target i686-linux-android --release && adb push target/i686-linux-android/release/android-socket /data/local
apiraino commented 3 years ago

@de-vri-es and @rustbot ping libs

any insights to share about this?

rustbot commented 3 years ago

Error: This team (libs) cannot be pinged via this command; it may need to be added to triagebot.toml on the master branch.

Please let @rust-lang/release know if you're having trouble with this bot.

de-vri-es commented 3 years ago

Hmm, maybe this has something to do with it: https://android.googlesource.com/platform/bionic.git/+/master/libc/SYSCALLS.TXT#264

# sockets for x86. These are done as an "indexed" call to socketcall syscall.
<snip>
int           __accept4:socketcall:18(int, struct sockaddr*, socklen_t*, int)  x86
<snip>

I've never heard of the "socketcall syscall" before, but apparently, on x86 Linux (before 4.3), you have to call socketcall(SYS_ACCEPT4, &args).

A bit annoying, because there is no documentation on what args should be. It "points to a block containing the actual arguments". The Linux man page specifically says this:

User programs should call the appropriate functions by their usual names. Only standard library implementors and kernel hackers need to know about socketcall().

Linux 4.3 also added the regular syscalls for x86, but that's not something we can rely on for the android target :(

So I think there are two options: 1) Drop accept4 on Android and accept the race condition on the CLOEXEC flag. 2) Figure out how to use socketcall correctly for x86 Android in libc.

de-vri-es commented 3 years ago

Linux 4.3 also added the regular syscalls for x86, but that's not something we can rely on for the android target :(

Note: although the same weird syscall is required for Linux < 4.3 (which is still a tier 1 platform), it shouldn't affect the Linux target because there rust libc is just calling accept4. from the real libc.

joshtriplett commented 3 years ago

It would be nice if Android's seccomp filter allowed the direct syscalls on x86 too, not just on arm; that'd be worth trying to get added to Android.

But in the meantime, it looks like Android targets on 32-bit x86 will need to invoke socketcall for all socket syscalls, rather than making the syscalls directly.

de-vri-es commented 3 years ago

It would be nice if Android's seccomp filter allowed the direct syscalls on x86 too, not just on arm; that'd be worth trying to get added to Android.

It may not be the seccomp filter though. Older x86 kernels really do not have the accept4 syscall.

Anyway, I'm working on a PR for libc to switch to socketcall on x86 android.

de-vri-es commented 3 years ago

PR opened: https://github.com/rust-lang/libc/pull/2079

apiraino commented 3 years ago

thank you @de-vri-es and @joshtriplett

apiraino commented 3 years ago

@rustbot label -I-prioritize -E-needs-bisection

de-vri-es commented 3 years ago

Note: back when accept4 was used on more platforms, std couldn't update to the latest libc. So it currently has the same workaround. I submitted https://github.com/rust-lang/rust/pull/82473 to switch to libc::accept4 so std would also receive the fix (after a libc bump).

de-vri-es commented 3 years ago

This isn't actually fixed yet, my bad for accidentally linking the issue. Might be good to re-open until it really is solved in std too.

apiraino commented 3 years ago

@de-vri-es no worries, I'll reopen it

tesuji commented 3 years ago

Lmao, github needs to come up with better algorithm to decide when to close issues with keywords. @JohnTitor could you reopen this issue ?

de-vri-es commented 3 years ago

Since the libc PR is merged, and std now uses libc::accept4, all that is left to fix this on the main branch is to have a libc version bump (and a matching bump of the libc dependency for std).

Is that going to be on time for 1.49.0, or should someone be pinged to ask for a libc release?

JohnTitor commented 3 years ago

@de-vri-es Released libc 0.2.87, you can now update the libc dependency :)

de-vri-es commented 3 years ago

Thanks! Opened https://github.com/rust-lang/rust/pull/82731 :)