reberhardt7 / cplayground

GNU General Public License v3.0
115 stars 14 forks source link

Figure out how to determine when two fds reference the same open file table entry #4

Closed reberhardt7 closed 4 years ago

reberhardt7 commented 4 years ago

Right now, we can get a fair amount of useful information from /proc/*/fd/ and from lsof. However, if process A and process B have a file descriptor pointing to file X, we can't tell if that's because they are sharing a file table entry (e.g. process B forked from process A after A opened the file) or because they opened the same file independently. We can't reconstruct the open file table unless we can figure out when two fds are aliases of each other.

reberhardt7 commented 4 years ago

This page provides some very useful information about the kernel data structures used to maintain the fd table and open file table: https://www.tldp.org/LDP/lki/lki-3.html

In particular, every process (i.e. "task") has a struct files_struct which makes up the file descriptor table (section 3.3 in above link). That struct has a pointer to a linked list of struct files (described in section 3.4); these struct files make up the entries in the open file table, and when a file descriptor is duplicated, the kernel simply duplicates the pointer to the struct file.

Notably, there is no unique ID in struct file; the only uniquely identifying information is the memory address at which the struct resides. This makes me think that searching for a way to identify aliased file descriptors is not going to turn anything up, and we're going to have to come up with something more creative. The two solutions I've thought of so far are:

reberhardt7 commented 4 years ago

Writing a kernel module looks pretty straightforward. We would register a callback for when processes are created/terminated, and create a /proc file for each process: http://pointer-overloading.blogspot.com/2013/09/linux-creating-entry-in-proc-file.html When the file is read, we can query the fd table for that process and generate a "file" with a list of IDs for the open files backing each fd. I need to look into how reasonable it is to load a kernel module for Docker on Mac, but this seems reasonable enough that I'm not going to rule it out just yet.

reberhardt7 commented 4 years ago

This link has some helpful explanations of the fields in struct file: https://www.star.bnl.gov/~liuzx/lki/lki-3.html#ss3.4

Likewise, this page has more detailed explanations, but on fewer of the fields: https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch03s04.html

This page has helpful documentation on how to interact with the structures (esp how to acquire locks): https://www.kernel.org/doc/Documentation/filesystems/files.txt