immunant / c2rust

Migrate C code to Rust
https://c2rust.com/
Other
4.01k stars 241 forks source link

(`c2rust-analyze`) Handle `Callee::UnknownDef`s calls pessimistically #843

Open kkysen opened 1 year ago

kkysen commented 1 year ago

We need to handle Callee::UnknownDef calls, as currently we just panic on them (or with RUST_LOG_PANIC=off, simply skip them, which is incorrect). As the definition of the function is unknown (it's either in another crate or an extern call), we don't know anything about what it can do, so we have to assume the worst. That is, we have to assume that the function will interact with all local pointers (any labeled with PointerIds) that are accessible from that function in every way possible.

For every way possible, this will lead it to have the pointer PermissionSet of READ | WRITE | OFFSET_ADD | OFFSET_SUB | FREE.

For interacting with all accessible local pointers, this is a bit trickier. For Callee::Trivial calls, those will already be marked as such, so we know that all Callee::UnknownDefs are non-trivial. This means (based on the latest definition discussed in #855) that either the call:

For the latter, the only accessible pointers will be direct arguments and return types that are pointers, so those are simple (though I'm still unsure about this; see https://github.com/immunant/c2rust/pull/855#issuecomment-1451283638).

For the former two, this means that the function can access practically anything, any pointer that is contained somewhere within the argument and return types, as well as any global pointers or pointers to globals, as these can be accessed through trait methods on generic types (for generic functions from another crate) or through FFI (for extern functions).

For globals, I'm not sure to what extent we currently support pointer permissions for them, but if and when we do, we'll need to mark them as fully accessible by these UnknownDef functions that can do anything to them.

kkysen commented 1 year ago

Could the same be done for function pointers? Since they could have multiple definitions, just like an extern function, which could have multiple definitions via dynamic linking, so we can assume the worst for both of them. As an optimization later, we may be able to get more information on function pointers (restrict them to a few definitions), but this could be a way to get initial fallback support for them, and there are a lot of function pointers in lighttpd.

kkysen commented 1 year ago

Previous Description

Building on #842, after we stop panicking on calls to extern functions (which is simply incorrect), we should default to pessimistic (but correct) pointer permissions for these extern function calls, assuming the worst of the function definitions based purely on their signature. The next step would be to properly link the extern functions as use imports, but that is far more complex, and won't even work for truly extern functions whose definitions are not available at all.

In visit_call_other we might at least like to iterate over the function signature to relate pointer permissions.

That is. when there is a function call, there is a terminator with _dest = call(_arg1, ..., _argN) and static analysis calls do_assign for those arguments and for the output of the call. We miss out on those relationships if the call doesn't have a function body to traverse, as things currently stand. In particular that adds a dataflow edge and equivalence constraint between arguments/parameters and result/output.

spernsteiner commented 1 year ago

this will lead it to have the pointer PermissionSet of READ | WRITE | OFFSET_ADD | OFFSET_SUB | FREE.

this means that the function can access practically anything, any pointer that is contained somewhere within the argument and return types, as well as any global pointers or pointers to globals, as these can be accessed through trait methods on generic types (for generic functions from another crate) or through FFI (for extern functions).

If I'm reading this right, then having a single UnknownDef call anywhere in the program causes the return types of all pointer-returning extern "C" functions (which is essentially all functions produced by c2rust) to get all permissions. Then I'd expect those permissions to propagate a bunch of other places throughout the program, so that a large fraction of the pointers in the program end up with extra permissions.

This approach is properly conservative, but I think will end up with so many unneeded permissions throughout the program that the results will be essentially unusable, to the point I'm not sure it would actually be an improvement over having the analysis error out.

Do we have an immediate need for allowing UnknownDef calls? My guess is that the main source of these at the moment is cross-compilation-unit calls, which should be handled by turning them into direct Rust calls (skipping the extern "C" indirection) instead. If we don't need this at the moment, I think we should put it off until we either find an important use case that requires it or collect some evidence that it will be useful (which currently I think is blocked on panic recovery in the analysis tool).

kkysen commented 1 year ago

this will lead it to have the pointer PermissionSet of READ | WRITE | OFFSET_ADD | OFFSET_SUB | FREE. this means that the function can access practically anything, any pointer that is contained somewhere within the argument and return types, as well as any global pointers or pointers to globals, as these can be accessed through trait methods on generic types (for generic functions from another crate) or through FFI (for extern functions).

If I'm reading this right, then having a single UnknownDef call anywhere in the program causes the return types of all pointer-returning extern "C" functions (which is essentially all functions produced by c2rust) to get all permissions. Then I'd expect those permissions to propagate a bunch of other places throughout the program, so that a large fraction of the pointers in the program end up with extra permissions.

This approach is properly conservative, but I think will end up with so many unneeded permissions throughout the program that the results will be essentially unusable, to the point I'm not sure it would actually be an improvement over having the analysis error out.

This doesn't apply to LocalDef functions, only UnknownDef ones, but yeah, I'd guess that it could easily infect most of the other code and make it unreasonably conservative.

Alternatively, we could assume that

That seems reasonably likely. We could also have some sort of flag to opt-in or opt-out of this (perhaps dynamic analysis could also help with confidence in this).

The fake extern functions that should be properly linked are probably much more likely to call each other and touch globals and stuff, but those we should eventually link properly and then they won't be a concern. True extern calls to other libraries, mostly libc, should be fine for the most part.

What do you think about this?

Do we have an immediate need for allowing UnknownDef calls? My guess is that the main source of these at the moment is cross-compilation-unit calls, which should be handled by turning them into direct Rust calls (skipping the extern "C" indirection) instead. If we don't need this at the moment, I think we should put it off until we either find an important use case that requires it or collect some evidence that it will be useful (which currently I think is blocked on panic recovery in the analysis tool).

extern calls at the moment are cross-compilation-unit calls (most of them) and then true extern calls, which seem to be mostly libc functions at this point, such as (in lighttpd-minimal):

and then roughly these in all of lighttpd:

fn __assert_fail(__assertion: *const c_char,__file: *const c_char,__line: c_uint,__function: *const c_char,) -> !;
fn __ctype_b_loc() -> *mut *const c_ushort;
fn __ctype_tolower_loc() -> *mut *const __int32_t;
fn __errno_location() -> *mut c_int;
fn _exit(_: c_int) -> !;
fn abort() -> !;
fn accept(__fd: c_int, __addr: __SOCKADDR_ARG, __addr_len: *mut socklen_t)-> c_int;
fn accept4(__fd: c_int,__addr: __SOCKADDR_ARG,__addr_len: *mut socklen_t,__flags: c_int,) -> c_int;
fn access(__name: *const c_char, __type: c_int) -> c_int;
fn alarm(__seconds: c_uint) -> c_uint;
fn bind(__fd: c_int, __addr: __CONST_SOCKADDR_ARG, __len: socklen_t) -> c_int;
fn calloc(_: c_ulong, _: c_ulong) -> *mut c_void;
fn chdir(__path: *const c_char) -> c_int;
fn chmod(__file: *const c_char, __mode: __mode_t) -> c_int;
fn chroot(__path: *const c_char) -> c_int;
fn clock_gettime(__clock_id: clockid_t, __tp: *mut timespec) -> c_int;
fn close(__fd: c_int) -> c_int;
fn closedir(__dirp: *mut DIR) -> c_int;
fn closelog();
fn connect(__fd: c_int, __addr: __CONST_SOCKADDR_ARG, __len: socklen_t) -> c_int;
fn copy_file_range(__infd: c_int,__pinoff: *mut __off64_t,__outfd: c_int,__poutoff: *mut __off64_t,__length: size_t,__flags: c_uint,) -> ssize_t;
fn crypt(__phrase: *const c_char, __setting: *const c_char) -> *mut c_char;
fn data_config_init() -> *mut data_config;
fn data_config_pcre_compile(dc: *mut data_config,pcre_jit: c_int,errh: *mut log_error_st,) -> c_int;
fn dup(__fd: c_int) -> c_int;
fn dup2(__fd: c_int, __fd2: c_int) -> c_int;
fn epoll_create1(__flags: c_int) -> c_int;
fn epoll_ctl(__epfd: c_int,__op: c_int,__fd: c_int,__event: *mut epoll_event,) -> c_int;
fn epoll_wait(__epfd: c_int,__events: *mut epoll_event,__maxevents: c_int,__timeout: c_int,) -> c_int;
fn execv(__path: *const c_char, __argv: *const *mut c_char) -> c_int;
fn execve(__path: *const c_char,__argv: *const *mut c_char,__envp: *const *mut c_char,) -> c_int;
fn exit(_: c_int) -> !;
fn explicit_bzero(__s: *mut c_void, __n: size_t);
fn fchdir(__fd: c_int) -> c_int;
fn fclose(__stream: *mut FILE) -> c_int;
fn fcntl(__fd: c_int, __cmd: c_int, _: ...) -> c_int;
fn fdopendir(__fd: c_int) -> *mut DIR;
fn fflush(__stream: *mut FILE) -> c_int;
fn fgets(__s: *mut c_char, __n: c_int, __stream: *mut FILE) -> *mut c_char;
fn fopen(_: *const c_char, _: *const c_char) -> *mut FILE;
fn fork() -> __pid_t;
fn fprintf(_: *mut FILE, _: *const c_char, _: ...) -> c_int;
fn fputc(__c: c_int, __stream: *mut FILE) -> c_int;
fn fputs(__s: *const c_char, __stream: *mut FILE) -> c_int;
fn fread(_: *mut c_void,_: c_ulong,_: c_ulong,_: *mut FILE,) -> c_ulong;
fn free(_: *mut c_void);
fn freeaddrinfo(__ai: *mut addrinfo);
fn fseek(__stream: *mut FILE, __off: c_long, __whence: c_int) -> c_int;
fn fstat(__fd: c_int, __buf: *mut stat) -> c_int;
fn fstatat(__fd: c_int,__file: *const c_char,__buf: *mut stat,__flag: c_int,) -> c_int;
fn ftell(__stream: *mut FILE) -> c_long;
fn ftruncate(__fd: c_int, __length: __off64_t) -> c_int;
fn gai_strerror(__ecode: c_int) -> *const c_char;
fn getaddrinfo(__name: *const c_char,__service: *const c_char,__req: *const addrinfo,__pai: *mut *mut addrinfo,) -> c_int;
fn getcwd(__buf: *mut c_char, __size: size_t) -> *mut c_char;
fn getegid() -> __gid_t;
fn getenv(__name: *const c_char) -> *mut c_char;
fn geteuid() -> __uid_t;
fn getgid() -> __gid_t;
fn getgrgid(__gid: __gid_t) -> *mut group;
fn getgrnam(__name: *const c_char) -> *mut group;
fn getloadavg(__loadavg: *mut c_double, __nelem: c_int) -> c_int;
fn getnameinfo(__sa: *const sockaddr,__salen: socklen_t,__host: *mut c_char,__hostlen: socklen_t,__serv: *mut c_char,__servlen: socklen_t,__flags: c_int,) -> c_int;
fn getopt(___argc: c_int,___argv: *const *mut c_char,__shortopts: *const c_char,) -> c_int;
fn getpeername(__fd: c_int, __addr: __SOCKADDR_ARG, __len: *mut socklen_t)-> c_int;
fn getpid() -> __pid_t;
fn getppid() -> __pid_t;
fn getpwnam(__name: *const c_char) -> *mut passwd;
fn getpwuid(__uid: __uid_t) -> *mut passwd;
fn getrlimit(__resource: __rlimit_resource_t, __rlimits: *mut rlimit) -> c_int;
fn getsockname(__fd: c_int, __addr: __SOCKADDR_ARG, __len: *mut socklen_t)-> c_int;
fn getsockopt(__fd: c_int,__level: c_int,__optname: c_int,__optval: *mut c_void,__optlen: *mut socklen_t,) -> c_int;
fn getuid() -> __uid_t;
fn glob(__pattern: *const c_char,__flags: c_int,__errfunc: Option<unsafe extern "C" fn(*const c_char, c_int) -> c_int>,__pglob: *mut glob_t,) -> c_int;
fn globfree(__pglob: *mut glob_t);
fn gmtime_r(__timer: *const time_t, __tp: *mut tm) -> *mut tm;
fn inet_ntop(__af: c_int,__cp: *const c_void,__buf: *mut c_char,__len: socklen_t,) -> *const c_char;
fn inet_pton(__af: c_int,__cp: *const c_char,__buf: *mut c_void,) -> c_int;
fn initgroups(__user: *const c_char, __group: __gid_t) -> c_int;
fn inotify_add_watch(__fd: c_int,__name: *const c_char,__mask: uint32_t,) -> c_int;
fn inotify_init1(__flags: c_int) -> c_int;
fn inotify_rm_watch(__fd: c_int, __wd: c_int) -> c_int;
fn ioctl(__fd: c_int, __request: c_ulong, _: ...) -> c_int;
fn kill(__pid: __pid_t, __sig: c_int) -> c_int;
fn linkat(__fromfd: c_int,__from: *const c_char,__tofd: c_int,__to: *const c_char,__flags: c_int,) -> c_int;
fn listen(__fd: c_int, __n: c_int) -> c_int;
fn localtime_r(__timer: *const time_t, __tp: *mut tm) -> *mut tm;
fn lseek(__fd: c_int, __offset: __off64_t, __whence: c_int) -> __off64_t;
fn lstat(__file: *const c_char, __buf: *mut stat) -> c_int;
fn malloc(_: c_ulong) -> *mut c_void;
fn malloc_trim(__pad: size_t) -> c_int;
fn mallopt(__param: c_int, __val: c_int) -> c_int;
fn memchr(_: *const c_void, _: c_int, _: c_ulong) -> *mut c_void;
fn memcmp(_: *const c_void, _: *const c_void, _: c_ulong) -> c_int;
fn memcpy(_: *mut c_void, _: *const c_void, _: c_ulong) -> *mut c_void;
fn memmove(_: *mut c_void, _: *const c_void, _: c_ulong)-> *mut c_void;
fn mempcpy(_: *mut c_void, _: *const c_void, _: c_ulong)-> *mut c_void;
fn memrchr(__s: *const c_void, __c: c_int, __n: size_t) -> *mut c_void;
fn memset(_: *mut c_void, _: c_int, _: c_ulong) -> *mut c_void;
fn mkdir(__path: *const c_char, __mode: __mode_t) -> c_int;
fn mkostemp(__template: *mut c_char, __flags: c_int) -> c_int;
fn mmap(__addr: *mut c_void,__len: size_t,__prot: c_int,__flags: c_int,__fd: c_int,__offset: __off64_t,) -> *mut c_void;
fn munmap(__addr: *mut c_void, __len: size_t) -> c_int;
fn open(__file: *const c_char, __oflag: c_int, _: ...) -> c_int;
fn openlog(__ident: *const c_char, __option: c_int, __facility: c_int);
fn perror(__s: *const c_char);
fn pipe(__pipedes: *mut c_int) -> c_int;
fn pipe2(__pipedes: *mut c_int, __flags: c_int) -> c_int;
fn poll(__fds: *mut pollfd, __nfds: nfds_t, __timeout: c_int) -> c_int;
fn prctl(__option: c_int, _: ...) -> c_int;
fn pread(__fd: c_int,__buf: *mut c_void,__nbytes: size_t,__offset: __off64_t,) -> ssize_t;
fn printf(_: *const c_char, _: ...) -> c_int;
fn putc(__c: c_int, __stream: *mut FILE) -> c_int;
fn puts(__s: *const c_char) -> c_int;
fn pwrite(__fd: c_int,__buf: *const c_void,__nbytes: size_t,__offset: __off64_t,) -> ssize_t;
fn pwritev(__fd: c_int,__iovec: *const iovec,__count: c_int,__offset: __off64_t,) -> ssize_t;
fn qsort(__base: *mut c_void, __nmemb: size_t, __size: size_t, __compar: __compar_fn_t);
fn raise(__sig: c_int) -> c_int;
fn rand() -> c_int;
fn random() -> c_long;
fn read(__fd: c_int, __buf: *mut c_void, __nbytes: size_t) -> ssize_t;
fn readdir(__dirp: *mut DIR) -> *mut dirent;
fn realloc(_: *mut c_void, _: c_ulong) -> *mut c_void;
fn recv(__fd: c_int,__buf: *mut c_void,__n: size_t,__flags: c_int,) -> ssize_t;
fn rename(__old: *const c_char, __new: *const c_char) -> c_int;
fn renameat2(__oldfd: c_int,__old: *const c_char,__newfd: c_int,__new: *const c_char,__flags: c_uint,) -> c_int;
fn rewind(__stream: *mut FILE);
fn rmdir(__path: *const c_char) -> c_int;
fn select(__nfds: c_int,__readfds: *mut fd_set,__writefds: *mut fd_set,__exceptfds: *mut fd_set,__timeout: *mut timeval,) -> c_int;
fn sendfile(__out_fd: c_int,__in_fd: c_int,__offset: *mut __off64_t,__count: size_t,) -> ssize_t;
fn setenv(__name: *const c_char,__value: *const c_char,__replace: c_int,) -> c_int;
fn setgid(__gid: __gid_t) -> c_int;
fn setgroups(__n: size_t, __groups: *const __gid_t) -> c_int;
fn setlocale(__category: c_int, __locale: *const c_char) -> *mut c_char;
fn setrlimit(__resource: __rlimit_resource_t, __rlimits: *const rlimit) -> c_int;
fn setsid() -> __pid_t;
fn setsockopt(__fd: c_int,__level: c_int,__optname: c_int,__optval: *const c_void,__optlen: socklen_t,) -> c_int;
fn setuid(__uid: __uid_t) -> c_int;
fn shutdown(__fd: c_int, __how: c_int) -> c_int;
fn sigaction(__sig: c_int,__act: *const sigaction,__oact: *mut sigaction,) -> c_int;
fn sigemptyset(__set: *mut sigset_t) -> c_int;
fn signal(__sig: c_int, __handler: __sighandler_t) -> __sighandler_t;
fn snprintf(_: *mut c_char,_: c_ulong,_: *const c_char,_: ...) -> c_int;
fn socket(__domain: c_int, __type: c_int, __protocol: c_int) -> c_int;
fn splice(__fdin: c_int,__offin: *mut __off64_t,__fdout: c_int,__offout: *mut __off64_t,__len: size_t,__flags: c_uint,) -> __ssize_t;
fn sprintf(_: *mut c_char, _: *const c_char, _: ...) -> c_int;
fn srand(__seed: c_uint);
fn srandom(__seed: c_uint);
fn stat(__file: *const c_char, __buf: *mut stat) -> c_int;
fn strcat(_: *mut c_char, _: *const c_char) -> *mut c_char;
fn strchr(_: *const c_char, _: c_int) -> *mut c_char;
fn strcmp(_: *const c_char, _: *const c_char) -> c_int;
fn strcpy(_: *mut c_char, _: *const c_char) -> *mut c_char;
fn strcspn(_: *const c_char, _: *const c_char) -> c_ulong;
fn strdup(_: *const c_char) -> *mut c_char;
fn strerror_r(__errnum: c_int,__buf: *mut c_char,__buflen: size_t,) -> *mut c_char;
fn strftime(__s: *mut c_char,__maxsize: size_t,__format: *const c_char,__tp: *const tm,) -> size_t;
fn strftime_cache_reset();
fn strlen(_: *const c_char) -> c_ulong;
fn strncasecmp(_: *const c_char, _: *const c_char, _: c_ulong)-> c_int;
fn strncmp(_: *const c_char, _: *const c_char, _: c_ulong) -> c_int;
fn strrchr(_: *const c_char, _: c_int) -> *mut c_char;
fn strstr(_: *const c_char, _: *const c_char) -> *mut c_char;
fn strtod(_: *const c_char, _: *mut *mut c_char) -> c_double;
fn strtol(_: *const c_char, _: *mut *mut c_char, _: c_int) -> c_long;
fn strtoll(_: *const c_char,_: *mut *mut c_char,_: c_int,) -> c_longlong;
fn strtoul(_: *const c_char, _: *mut *mut c_char, _: c_int) -> c_ulong;
fn syscall(__sysno: c_long, _: ...) -> c_long;
fn sysconf(__name: c_int) -> c_long;
fn syslog(__pri: c_int, __fmt: *const c_char, _: ...);
fn time(__timer: *mut time_t) -> time_t;
fn timegm(__tp: *mut tm) -> time_t;
fn tzset();
fn unlink(__name: *const c_char) -> c_int;
fn unlinkat(__fd: c_int, __name: *const c_char, __flag: c_int)-> c_int;
fn unsetenv(__name: *const c_char) -> c_int;
fn vsnprintf(_: *mut c_char,_: c_ulong,_: *const c_char,_: ::std::ffi::VaList,) -> c_int;
fn vsprintf(_: *mut c_char, _: *const c_char, _: ::std::ffi::VaList)-> c_int;
fn waitpid(__pid: __pid_t, __stat_loc: *mut c_int, __options: c_int) -> __pid_t;
fn write(__fd: c_int, __buf: *const c_void, __n: size_t) -> ssize_t;
fn write_all(fd: c_int, buf: *const c_void, count: size_t) -> ssize_t;
fn writev(__fd: c_int, __iovec: *const iovec, __count: c_int) -> ssize_t;

these seem to be all or nearly all of libc. There are a lot less of these than the fake extern calls that can be removed, but there are still a relatively large amount of them.