Closed reberhardt7 closed 4 years ago
This page provides some very useful information about the kernel data structures used to maintain the fd table and open file table: https://www.tldp.org/LDP/lki/lki-3.html
In particular, every process (i.e. "task") has a struct files_struct
which makes up the file descriptor table (section 3.3 in above link). That struct has a pointer to a linked list of struct file
s (described in section 3.4); these struct file
s make up the entries in the open file table, and when a file descriptor is duplicated, the kernel simply duplicates the pointer to the struct file
.
Notably, there is no unique ID in struct file
; the only uniquely identifying information is the memory address at which the struct resides. This makes me think that searching for a way to identify aliased file descriptors is not going to turn anything up, and we're going to have to come up with something more creative. The two solutions I've thought of so far are:
struct file
memory address). I don't actually think this would be that hard (the data is easy to access, and we could create another set of files in /proc
exposing the data), and it would be better for avoiding weird bugs caused by not properly emulating syscall behavior. However, this seems bad for testing and deployment reasons. Docker containers just use the host kernel, so we would need to modify the host kernel, which is a big pain for making this easily deployable...Writing a kernel module looks pretty straightforward. We would register a callback for when processes are created/terminated, and create a /proc file for each process: http://pointer-overloading.blogspot.com/2013/09/linux-creating-entry-in-proc-file.html When the file is read, we can query the fd table for that process and generate a "file" with a list of IDs for the open files backing each fd. I need to look into how reasonable it is to load a kernel module for Docker on Mac, but this seems reasonable enough that I'm not going to rule it out just yet.
This link has some helpful explanations of the fields in struct file
: https://www.star.bnl.gov/~liuzx/lki/lki-3.html#ss3.4
Likewise, this page has more detailed explanations, but on fewer of the fields: https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s04.html
This page has helpful documentation on how to interact with the structures (esp how to acquire locks): https://www.kernel.org/doc/Documentation/filesystems/files.txt
Right now, we can get a fair amount of useful information from
/proc/*/fd/
and fromlsof
. However, if process A and process B have a file descriptor pointing to file X, we can't tell if that's because they are sharing a file table entry (e.g. process B forked from process A after A opened the file) or because they opened the same file independently. We can't reconstruct the open file table unless we can figure out when two fds are aliases of each other.