Open kkysen opened 1 year ago
Could the same be done for function pointers? Since they could have multiple definitions, just like an extern
function, which could have multiple definitions via dynamic linking, so we can assume the worst for both of them. As an optimization later, we may be able to get more information on function pointers (restrict them to a few definitions), but this could be a way to get initial fallback support for them, and there are a lot of function pointers in lighttpd
.
Building on #842, after we stop panicking on calls to extern
functions (which is simply incorrect), we should default to pessimistic (but correct) pointer permissions for these extern
function calls, assuming the worst of the function definitions based purely on their signature. The next step would be to properly link the extern
functions as use
imports, but that is far more complex, and won't even work for truly extern
functions whose definitions are not available at all.
In visit_call_other
we might at least like to iterate over the function signature to relate pointer permissions.
That is. when there is a function call, there is a terminator with _dest = call(_arg1, ..., _argN)
and static analysis calls do_assign
for those arguments and for the output of the call. We miss out on those relationships if the call doesn't have a function body to traverse, as things currently stand. In particular that adds a dataflow edge and equivalence constraint between arguments/parameters and result/output.
this will lead it to have the pointer PermissionSet of READ | WRITE | OFFSET_ADD | OFFSET_SUB | FREE.
this means that the function can access practically anything, any pointer that is contained somewhere within the argument and return types, as well as any global pointers or pointers to globals, as these can be accessed through trait methods on generic types (for generic functions from another crate) or through FFI (for extern functions).
If I'm reading this right, then having a single UnknownDef
call anywhere in the program causes the return types of all pointer-returning extern "C"
functions (which is essentially all functions produced by c2rust) to get all permissions. Then I'd expect those permissions to propagate a bunch of other places throughout the program, so that a large fraction of the pointers in the program end up with extra permissions.
This approach is properly conservative, but I think will end up with so many unneeded permissions throughout the program that the results will be essentially unusable, to the point I'm not sure it would actually be an improvement over having the analysis error out.
Do we have an immediate need for allowing UnknownDef
calls? My guess is that the main source of these at the moment is cross-compilation-unit calls, which should be handled by turning them into direct Rust calls (skipping the extern "C"
indirection) instead. If we don't need this at the moment, I think we should put it off until we either find an important use case that requires it or collect some evidence that it will be useful (which currently I think is blocked on panic recovery in the analysis tool).
this will lead it to have the pointer PermissionSet of READ | WRITE | OFFSET_ADD | OFFSET_SUB | FREE. this means that the function can access practically anything, any pointer that is contained somewhere within the argument and return types, as well as any global pointers or pointers to globals, as these can be accessed through trait methods on generic types (for generic functions from another crate) or through FFI (for extern functions).
If I'm reading this right, then having a single
UnknownDef
call anywhere in the program causes the return types of all pointer-returningextern "C"
functions (which is essentially all functions produced by c2rust) to get all permissions. Then I'd expect those permissions to propagate a bunch of other places throughout the program, so that a large fraction of the pointers in the program end up with extra permissions.This approach is properly conservative, but I think will end up with so many unneeded permissions throughout the program that the results will be essentially unusable, to the point I'm not sure it would actually be an improvement over having the analysis error out.
This doesn't apply to LocalDef
functions, only UnknownDef
ones, but yeah, I'd guess that it could easily infect most of the other code and make it unreasonably conservative.
Alternatively, we could assume that
extern
function (real ones) don't call other functions and use globals through FFIThat seems reasonably likely. We could also have some sort of flag to opt-in or opt-out of this (perhaps dynamic analysis could also help with confidence in this).
The fake extern
functions that should be properly linked are probably much more likely to call each other and touch globals and stuff, but those we should eventually link properly and then they won't be a concern. True extern
calls to other libraries, mostly libc
, should be fine for the most part.
What do you think about this?
Do we have an immediate need for allowing
UnknownDef
calls? My guess is that the main source of these at the moment is cross-compilation-unit calls, which should be handled by turning them into direct Rust calls (skipping theextern "C"
indirection) instead. If we don't need this at the moment, I think we should put it off until we either find an important use case that requires it or collect some evidence that it will be useful (which currently I think is blocked on panic recovery in the analysis tool).
extern
calls at the moment are cross-compilation-unit calls (most of them) and then true extern
calls, which seem to be mostly libc
functions at this point, such as (in lighttpd-minimal
):
clock_gettime
setsockopt
shutdown
strchr
strstr
close
read
__errno_location
and then roughly these in all of lighttpd
:
fn __assert_fail(__assertion: *const c_char,__file: *const c_char,__line: c_uint,__function: *const c_char,) -> !;
fn __ctype_b_loc() -> *mut *const c_ushort;
fn __ctype_tolower_loc() -> *mut *const __int32_t;
fn __errno_location() -> *mut c_int;
fn _exit(_: c_int) -> !;
fn abort() -> !;
fn accept(__fd: c_int, __addr: __SOCKADDR_ARG, __addr_len: *mut socklen_t)-> c_int;
fn accept4(__fd: c_int,__addr: __SOCKADDR_ARG,__addr_len: *mut socklen_t,__flags: c_int,) -> c_int;
fn access(__name: *const c_char, __type: c_int) -> c_int;
fn alarm(__seconds: c_uint) -> c_uint;
fn bind(__fd: c_int, __addr: __CONST_SOCKADDR_ARG, __len: socklen_t) -> c_int;
fn calloc(_: c_ulong, _: c_ulong) -> *mut c_void;
fn chdir(__path: *const c_char) -> c_int;
fn chmod(__file: *const c_char, __mode: __mode_t) -> c_int;
fn chroot(__path: *const c_char) -> c_int;
fn clock_gettime(__clock_id: clockid_t, __tp: *mut timespec) -> c_int;
fn close(__fd: c_int) -> c_int;
fn closedir(__dirp: *mut DIR) -> c_int;
fn closelog();
fn connect(__fd: c_int, __addr: __CONST_SOCKADDR_ARG, __len: socklen_t) -> c_int;
fn copy_file_range(__infd: c_int,__pinoff: *mut __off64_t,__outfd: c_int,__poutoff: *mut __off64_t,__length: size_t,__flags: c_uint,) -> ssize_t;
fn crypt(__phrase: *const c_char, __setting: *const c_char) -> *mut c_char;
fn data_config_init() -> *mut data_config;
fn data_config_pcre_compile(dc: *mut data_config,pcre_jit: c_int,errh: *mut log_error_st,) -> c_int;
fn dup(__fd: c_int) -> c_int;
fn dup2(__fd: c_int, __fd2: c_int) -> c_int;
fn epoll_create1(__flags: c_int) -> c_int;
fn epoll_ctl(__epfd: c_int,__op: c_int,__fd: c_int,__event: *mut epoll_event,) -> c_int;
fn epoll_wait(__epfd: c_int,__events: *mut epoll_event,__maxevents: c_int,__timeout: c_int,) -> c_int;
fn execv(__path: *const c_char, __argv: *const *mut c_char) -> c_int;
fn execve(__path: *const c_char,__argv: *const *mut c_char,__envp: *const *mut c_char,) -> c_int;
fn exit(_: c_int) -> !;
fn explicit_bzero(__s: *mut c_void, __n: size_t);
fn fchdir(__fd: c_int) -> c_int;
fn fclose(__stream: *mut FILE) -> c_int;
fn fcntl(__fd: c_int, __cmd: c_int, _: ...) -> c_int;
fn fdopendir(__fd: c_int) -> *mut DIR;
fn fflush(__stream: *mut FILE) -> c_int;
fn fgets(__s: *mut c_char, __n: c_int, __stream: *mut FILE) -> *mut c_char;
fn fopen(_: *const c_char, _: *const c_char) -> *mut FILE;
fn fork() -> __pid_t;
fn fprintf(_: *mut FILE, _: *const c_char, _: ...) -> c_int;
fn fputc(__c: c_int, __stream: *mut FILE) -> c_int;
fn fputs(__s: *const c_char, __stream: *mut FILE) -> c_int;
fn fread(_: *mut c_void,_: c_ulong,_: c_ulong,_: *mut FILE,) -> c_ulong;
fn free(_: *mut c_void);
fn freeaddrinfo(__ai: *mut addrinfo);
fn fseek(__stream: *mut FILE, __off: c_long, __whence: c_int) -> c_int;
fn fstat(__fd: c_int, __buf: *mut stat) -> c_int;
fn fstatat(__fd: c_int,__file: *const c_char,__buf: *mut stat,__flag: c_int,) -> c_int;
fn ftell(__stream: *mut FILE) -> c_long;
fn ftruncate(__fd: c_int, __length: __off64_t) -> c_int;
fn gai_strerror(__ecode: c_int) -> *const c_char;
fn getaddrinfo(__name: *const c_char,__service: *const c_char,__req: *const addrinfo,__pai: *mut *mut addrinfo,) -> c_int;
fn getcwd(__buf: *mut c_char, __size: size_t) -> *mut c_char;
fn getegid() -> __gid_t;
fn getenv(__name: *const c_char) -> *mut c_char;
fn geteuid() -> __uid_t;
fn getgid() -> __gid_t;
fn getgrgid(__gid: __gid_t) -> *mut group;
fn getgrnam(__name: *const c_char) -> *mut group;
fn getloadavg(__loadavg: *mut c_double, __nelem: c_int) -> c_int;
fn getnameinfo(__sa: *const sockaddr,__salen: socklen_t,__host: *mut c_char,__hostlen: socklen_t,__serv: *mut c_char,__servlen: socklen_t,__flags: c_int,) -> c_int;
fn getopt(___argc: c_int,___argv: *const *mut c_char,__shortopts: *const c_char,) -> c_int;
fn getpeername(__fd: c_int, __addr: __SOCKADDR_ARG, __len: *mut socklen_t)-> c_int;
fn getpid() -> __pid_t;
fn getppid() -> __pid_t;
fn getpwnam(__name: *const c_char) -> *mut passwd;
fn getpwuid(__uid: __uid_t) -> *mut passwd;
fn getrlimit(__resource: __rlimit_resource_t, __rlimits: *mut rlimit) -> c_int;
fn getsockname(__fd: c_int, __addr: __SOCKADDR_ARG, __len: *mut socklen_t)-> c_int;
fn getsockopt(__fd: c_int,__level: c_int,__optname: c_int,__optval: *mut c_void,__optlen: *mut socklen_t,) -> c_int;
fn getuid() -> __uid_t;
fn glob(__pattern: *const c_char,__flags: c_int,__errfunc: Option<unsafe extern "C" fn(*const c_char, c_int) -> c_int>,__pglob: *mut glob_t,) -> c_int;
fn globfree(__pglob: *mut glob_t);
fn gmtime_r(__timer: *const time_t, __tp: *mut tm) -> *mut tm;
fn inet_ntop(__af: c_int,__cp: *const c_void,__buf: *mut c_char,__len: socklen_t,) -> *const c_char;
fn inet_pton(__af: c_int,__cp: *const c_char,__buf: *mut c_void,) -> c_int;
fn initgroups(__user: *const c_char, __group: __gid_t) -> c_int;
fn inotify_add_watch(__fd: c_int,__name: *const c_char,__mask: uint32_t,) -> c_int;
fn inotify_init1(__flags: c_int) -> c_int;
fn inotify_rm_watch(__fd: c_int, __wd: c_int) -> c_int;
fn ioctl(__fd: c_int, __request: c_ulong, _: ...) -> c_int;
fn kill(__pid: __pid_t, __sig: c_int) -> c_int;
fn linkat(__fromfd: c_int,__from: *const c_char,__tofd: c_int,__to: *const c_char,__flags: c_int,) -> c_int;
fn listen(__fd: c_int, __n: c_int) -> c_int;
fn localtime_r(__timer: *const time_t, __tp: *mut tm) -> *mut tm;
fn lseek(__fd: c_int, __offset: __off64_t, __whence: c_int) -> __off64_t;
fn lstat(__file: *const c_char, __buf: *mut stat) -> c_int;
fn malloc(_: c_ulong) -> *mut c_void;
fn malloc_trim(__pad: size_t) -> c_int;
fn mallopt(__param: c_int, __val: c_int) -> c_int;
fn memchr(_: *const c_void, _: c_int, _: c_ulong) -> *mut c_void;
fn memcmp(_: *const c_void, _: *const c_void, _: c_ulong) -> c_int;
fn memcpy(_: *mut c_void, _: *const c_void, _: c_ulong) -> *mut c_void;
fn memmove(_: *mut c_void, _: *const c_void, _: c_ulong)-> *mut c_void;
fn mempcpy(_: *mut c_void, _: *const c_void, _: c_ulong)-> *mut c_void;
fn memrchr(__s: *const c_void, __c: c_int, __n: size_t) -> *mut c_void;
fn memset(_: *mut c_void, _: c_int, _: c_ulong) -> *mut c_void;
fn mkdir(__path: *const c_char, __mode: __mode_t) -> c_int;
fn mkostemp(__template: *mut c_char, __flags: c_int) -> c_int;
fn mmap(__addr: *mut c_void,__len: size_t,__prot: c_int,__flags: c_int,__fd: c_int,__offset: __off64_t,) -> *mut c_void;
fn munmap(__addr: *mut c_void, __len: size_t) -> c_int;
fn open(__file: *const c_char, __oflag: c_int, _: ...) -> c_int;
fn openlog(__ident: *const c_char, __option: c_int, __facility: c_int);
fn perror(__s: *const c_char);
fn pipe(__pipedes: *mut c_int) -> c_int;
fn pipe2(__pipedes: *mut c_int, __flags: c_int) -> c_int;
fn poll(__fds: *mut pollfd, __nfds: nfds_t, __timeout: c_int) -> c_int;
fn prctl(__option: c_int, _: ...) -> c_int;
fn pread(__fd: c_int,__buf: *mut c_void,__nbytes: size_t,__offset: __off64_t,) -> ssize_t;
fn printf(_: *const c_char, _: ...) -> c_int;
fn putc(__c: c_int, __stream: *mut FILE) -> c_int;
fn puts(__s: *const c_char) -> c_int;
fn pwrite(__fd: c_int,__buf: *const c_void,__nbytes: size_t,__offset: __off64_t,) -> ssize_t;
fn pwritev(__fd: c_int,__iovec: *const iovec,__count: c_int,__offset: __off64_t,) -> ssize_t;
fn qsort(__base: *mut c_void, __nmemb: size_t, __size: size_t, __compar: __compar_fn_t);
fn raise(__sig: c_int) -> c_int;
fn rand() -> c_int;
fn random() -> c_long;
fn read(__fd: c_int, __buf: *mut c_void, __nbytes: size_t) -> ssize_t;
fn readdir(__dirp: *mut DIR) -> *mut dirent;
fn realloc(_: *mut c_void, _: c_ulong) -> *mut c_void;
fn recv(__fd: c_int,__buf: *mut c_void,__n: size_t,__flags: c_int,) -> ssize_t;
fn rename(__old: *const c_char, __new: *const c_char) -> c_int;
fn renameat2(__oldfd: c_int,__old: *const c_char,__newfd: c_int,__new: *const c_char,__flags: c_uint,) -> c_int;
fn rewind(__stream: *mut FILE);
fn rmdir(__path: *const c_char) -> c_int;
fn select(__nfds: c_int,__readfds: *mut fd_set,__writefds: *mut fd_set,__exceptfds: *mut fd_set,__timeout: *mut timeval,) -> c_int;
fn sendfile(__out_fd: c_int,__in_fd: c_int,__offset: *mut __off64_t,__count: size_t,) -> ssize_t;
fn setenv(__name: *const c_char,__value: *const c_char,__replace: c_int,) -> c_int;
fn setgid(__gid: __gid_t) -> c_int;
fn setgroups(__n: size_t, __groups: *const __gid_t) -> c_int;
fn setlocale(__category: c_int, __locale: *const c_char) -> *mut c_char;
fn setrlimit(__resource: __rlimit_resource_t, __rlimits: *const rlimit) -> c_int;
fn setsid() -> __pid_t;
fn setsockopt(__fd: c_int,__level: c_int,__optname: c_int,__optval: *const c_void,__optlen: socklen_t,) -> c_int;
fn setuid(__uid: __uid_t) -> c_int;
fn shutdown(__fd: c_int, __how: c_int) -> c_int;
fn sigaction(__sig: c_int,__act: *const sigaction,__oact: *mut sigaction,) -> c_int;
fn sigemptyset(__set: *mut sigset_t) -> c_int;
fn signal(__sig: c_int, __handler: __sighandler_t) -> __sighandler_t;
fn snprintf(_: *mut c_char,_: c_ulong,_: *const c_char,_: ...) -> c_int;
fn socket(__domain: c_int, __type: c_int, __protocol: c_int) -> c_int;
fn splice(__fdin: c_int,__offin: *mut __off64_t,__fdout: c_int,__offout: *mut __off64_t,__len: size_t,__flags: c_uint,) -> __ssize_t;
fn sprintf(_: *mut c_char, _: *const c_char, _: ...) -> c_int;
fn srand(__seed: c_uint);
fn srandom(__seed: c_uint);
fn stat(__file: *const c_char, __buf: *mut stat) -> c_int;
fn strcat(_: *mut c_char, _: *const c_char) -> *mut c_char;
fn strchr(_: *const c_char, _: c_int) -> *mut c_char;
fn strcmp(_: *const c_char, _: *const c_char) -> c_int;
fn strcpy(_: *mut c_char, _: *const c_char) -> *mut c_char;
fn strcspn(_: *const c_char, _: *const c_char) -> c_ulong;
fn strdup(_: *const c_char) -> *mut c_char;
fn strerror_r(__errnum: c_int,__buf: *mut c_char,__buflen: size_t,) -> *mut c_char;
fn strftime(__s: *mut c_char,__maxsize: size_t,__format: *const c_char,__tp: *const tm,) -> size_t;
fn strftime_cache_reset();
fn strlen(_: *const c_char) -> c_ulong;
fn strncasecmp(_: *const c_char, _: *const c_char, _: c_ulong)-> c_int;
fn strncmp(_: *const c_char, _: *const c_char, _: c_ulong) -> c_int;
fn strrchr(_: *const c_char, _: c_int) -> *mut c_char;
fn strstr(_: *const c_char, _: *const c_char) -> *mut c_char;
fn strtod(_: *const c_char, _: *mut *mut c_char) -> c_double;
fn strtol(_: *const c_char, _: *mut *mut c_char, _: c_int) -> c_long;
fn strtoll(_: *const c_char,_: *mut *mut c_char,_: c_int,) -> c_longlong;
fn strtoul(_: *const c_char, _: *mut *mut c_char, _: c_int) -> c_ulong;
fn syscall(__sysno: c_long, _: ...) -> c_long;
fn sysconf(__name: c_int) -> c_long;
fn syslog(__pri: c_int, __fmt: *const c_char, _: ...);
fn time(__timer: *mut time_t) -> time_t;
fn timegm(__tp: *mut tm) -> time_t;
fn tzset();
fn unlink(__name: *const c_char) -> c_int;
fn unlinkat(__fd: c_int, __name: *const c_char, __flag: c_int)-> c_int;
fn unsetenv(__name: *const c_char) -> c_int;
fn vsnprintf(_: *mut c_char,_: c_ulong,_: *const c_char,_: ::std::ffi::VaList,) -> c_int;
fn vsprintf(_: *mut c_char, _: *const c_char, _: ::std::ffi::VaList)-> c_int;
fn waitpid(__pid: __pid_t, __stat_loc: *mut c_int, __options: c_int) -> __pid_t;
fn write(__fd: c_int, __buf: *const c_void, __n: size_t) -> ssize_t;
fn write_all(fd: c_int, buf: *const c_void, count: size_t) -> ssize_t;
fn writev(__fd: c_int, __iovec: *const iovec, __count: c_int) -> ssize_t;
these seem to be all or nearly all of libc
. There are a lot less of these than the fake extern
calls that can be removed, but there are still a relatively large amount of them.
We need to handle
Callee::UnknownDef
calls, as currently we just panic on them (or withRUST_LOG_PANIC=off
, simply skip them, which is incorrect). As the definition of the function is unknown (it's either in another crate or anextern
call), we don't know anything about what it can do, so we have to assume the worst. That is, we have to assume that the function will interact with all local pointers (any labeled withPointerId
s) that are accessible from that function in every way possible.For every way possible, this will lead it to have the pointer
PermissionSet
ofREAD | WRITE | OFFSET_ADD | OFFSET_SUB | FREE
.For interacting with all accessible local pointers, this is a bit trickier. For
Callee::Trivial
calls, those will already be marked as such, so we know that allCallee::UnknownDef
s are non-trivial. This means (based on the latest definition discussed in #855) that either the call:unsafe
(includingextern
)Ty
in its genericsTy
sFor the latter, the only accessible pointers will be direct arguments and return types that are pointers, so those are simple (though I'm still unsure about this; see https://github.com/immunant/c2rust/pull/855#issuecomment-1451283638).
For the former two, this means that the function can access practically anything, any pointer that is contained somewhere within the argument and return types, as well as any global pointers or pointers to globals, as these can be accessed through trait methods on generic types (for generic functions from another crate) or through FFI (for
extern
functions).For globals, I'm not sure to what extent we currently support pointer permissions for them, but if and when we do, we'll need to mark them as fully accessible by these
UnknownDef
functions that can do anything to them.