lsof-org / lsof

LiSt Open Files
https://lsof.readthedocs.io
Other
423 stars 106 forks source link

[FEATURE] Expose functionality as a re-usable library #268

Open BenBE opened 1 year ago

BenBE commented 1 year ago

Is your feature request related to a problem? Please describe. I'm always frustrated when the only way to extract information about open file descriptors in a system is to parse them from running lsof as a sub-process and filter its output. While this might be fine for single snap-shotted use, this becomes quite wasteful when trying to integrate such functionality in a system monitoring tool which could benefit from these information being presented with a sane API.

Describe the solution you'd like The collected information about open file descriptors and processes should be made available as a linkable shared library that other programs can use. This library should expose information about processes (+stats) and their open file descriptors. The library should expose functions to create snapshots of the whole system or just single processes, allow to collect information on single file, all file descriptors of a process or all of them. The library should allow to use a cache for these information so that information updates can be done incrementally (similar to the existing monitoring mode).

Describe alternatives you've considered There is an existing feature of the lsof binary to run in a continuous mode. While this mode in general allows to receive updates for open file descriptors in fixed intervals it's not suitable to use when the scope of needed information changes over time (e.g. system monitor just showing number of open files per process and only needs actual file names and types once details for a particular process are requested by a user). Using a library could instruct the library which information to collect and request missing information on-demand.

There's furthermore a re-implementation of lsof in util-linux with JSON as interchange format (which is at times easier to integrate with applications), but which suffers similar flexibility issues over its runtime while furthermore neither being widely available nor backwards compatible with lsof. Thus while JSON is a nice interchange format for data (nless you have to parse it), a binary, in-process API would be much easier to work with.

Additional context The machine-readable format of lsof is hardly documented. In particular some parts of it are overly complicated (having to convert numbers back-and-forth between representations, when things would be unambiguous in a proper library API). Also, depending on the platform, certain attributes are not exposed consistently, leading to situations where sometimes only one of two attributes (e.g. size or offset) are available.

jiegec commented 1 year ago

Reading the code, the dialect-specific code and user-interfacing code is tightly coupled via global variables, and there is extensive usage of static variables. It will be a very big project ;) Maybe suitable for a GSoC?

jiegec commented 1 year ago

I made some progress at https://github.com/jiegec/lsof/blob/library/include/lsof.h, you can see API definitions there, and here is a list of DONEs and TODOs:

BenBE commented 1 year ago

Had a quick look at the API: Looks good so far.

Probably some suggestions:

From the PoV of an implementer of a system monitoring tool that tracks process information itself: How would I go about receiving updated information about open files for one process in a regular interval (polling is totally fine and expected, the updating part is the focus here)? As I read the API now I'd have to create a completely new context each time causing quite some information to be collected each time (in particular process information) that are thrown away again; or am I missing sth here?

Regarding platform support: How about extending support to FreeBSD and Darwin next?

jiegec commented 1 year ago
  • lsof_file_access_mode: Be explicit about the value of the modes. Maybe _NONE = 0, _READ = 4, _WRITE = 2, _READ_WRITE = _READ | _WRITE, …

DONE

  • Do the lsof_fd_type values align with what the OS usually uses? (e.g. git shows 100664 for a regular file, the first two digits being the file type)

Actually, lsof_fd_type does not correspond to struct stat.st_mode, but rather how the file relates to the process, e.g. it is an open fd, cwd, root directory, memory-mapped file etc. The regular file/directory distinction is given in the TYPE column, saved as a string in struct lfile.type. The format is very casual, as can be seen from manpage:

       TYPE       is  the  type  of  the node associated with the file - e.g.,
                  GDIR, GREG, VDIR, VREG, etc.

                  or ``IPv4'' for an IPv4 socket;

                  or ``IPv6'' for an open IPv6 network file - even if its  ad‐
                  dress is IPv4, mapped in an IPv6 address;

                  or ``ax25'' for a Linux AX.25 socket;

                  or ``inet'' for an Internet domain socket;

                  or ``lla'' for a HP-UX link level access file;

                  or ``rte'' for an AF_ROUTE socket;

                  or ``sock'' for a socket of unknown domain;

Which is a big mess, and even the regular file is represented as REG, VREG.. Upper cases and lower cases are mixed. I want to unify them, but may break downstream users. Lf->ntype is better, but it is mainly for internal use, for example N_NFS can override N_REGLR.

  • lsof_protocol_type is missing UDP and Unix Domain sockets

Yes, working on it. But unix domain socket is special, it is currently reported in TYPE column, the same level as IPv4/IPv6 because unix domain socket can be dgram/stream-based.

  • lsof_file should probably group the valid fields into a valid mask with uint32_t or uint64_t

Yes, WIP.

  • lsof_file might include a linked list with additional attributes (e.g. socket addresses, paths), thus avoiding regular redesigns when new columns/information have to be added (each attr containing an attr_type and a data section (union?)

Good idea, I can also use this way to report selection results. lsof cli requires this information to report and set exit code.

  • DiD: It's usually better to place the count of elements before the array (pointer) to those elements (cf. lsof_proces->files)

Thanks for the suggestions, I will work on it after finishing FreeBSD support.

As I read the API now I'd have to create a completely new context each time causing quite some information to be collected each time (in particular process information) that are thrown away again; or am I missing sth here?

No, you can call lsof_gather() multiple times with the same context, just the options cannot be changed. This is because liblsof requires some preprocessing steps for the options.

Regarding platform support: How about extending support to FreeBSD and Darwin next?

Darwin done, FreeBSD/NetBSD/OpenBSD WIP.

jiegec commented 1 year ago

Progress update:

I will focus on liblsof user interface next.