onsi / gomega

Ginkgo's Preferred Matcher Library
http://onsi.github.io/gomega/
MIT License
2.16k stars 282 forks source link

RFC: file descriptor leak detection for Gomega? #545

Open thediveo opened 2 years ago

thediveo commented 2 years ago

After the experimental new gleak package I took on file descriptor leak checking. I would like to ask for feedback (and maybe interest) in my fdooze module ... like with noleak/gleak I'm open to integrating it into Gomega, if requested.

Usually, it should boil down to this:

BeforeEach(func() {
    goodfds := Filedescriptors()
    DeferCleanup(func() {
        Expect(Filedescriptors()).NotTo(HaveLeakedFds(goodfds))
    })
})

File descriptors are very OS specific and fdooze is Linux-only, because that is what according to the most recent Go survey more than 90% of Go developers use. There are other inconveniences when it comes to fd's compared with goroutines: no real identity and fd number get recycled as soon as possible. Also, you never know who was opening the fd (but then, this might be deep inside some 3rd party package anyway).

To help finding the source of a leaked fd, fdooze dumps well-known flags and options with their symbolic constant names, wherever possible. Sockets are dumped with their addresses (hopefully) more usefully formatted, etc.

As I said: feedback and suggestions highly welcome!

Oh, this time the mascot is "Goigi", the (young) file descriptor plumber.

And about the naming: maybe gooze instead, but that's a real brain twister?

thediveo commented 2 years ago

Nota bene: found two file descriptor leaks in my lxkns production code using fdooze, for corner cases that laid low for a long time ... in a place where defer cannot be used as it spans across potentially long code sections in the caller. 😁

Oh, and defer'ing multiple Close()s using the same variable passed via the closure instead of via the deferred functions parameter is always a bad idea ... not that I've been warned about that before.

onsi commented 2 years ago

oh snap this is super impressive! I'm on a Mac so haven't been able to play with it but would love to pul lit in/include it (or, at minimum, update the README to point people at it if it would be easier to maintain as a standalone library). this sure would have saved a ton of time in an earlier life of mine.

one suggestion - what might it look like to inspect the fds for a process launched by a test suite. one context I've seen problems with leaky fds come up is when some process under (integration) test misbehaves. it could be valuable to do something like:

session, err := gexec.Start(cmd, GinkgoWriter, GinkgoWriter)
Expect(err).NotTo(HaveOccurred())

goodfds := FiledescriptorsFor(session)

client.DoSomeAPIThing()
Expect(session).Should(gbytes.Say("I did the thing"))
...

Eventually(FiledescriptorsFor(session)).ShouldNot(HaveLeakedFds(goodfds))
Eventually(session.Interrupt()).Should(gexec.Exit(0))
thediveo commented 2 years ago

@onsi now that's a can of worms of its own and a fun to ask here with you and others carrying the initial idea much farther!

Since you started the new process under the same UID/GID and didn't something special the file descriptor information of that started process would be accessible. Somewhere the session should give me the PID and then it's the same (on Linux) regardless of looking at my own process using the /proc/self symlink or some other process /proc/$PID. Looks to be a straightforward extension without much effort. The most difficult part will be to write the new unit tests to simulate a leaky started process.

I'm open to also integrating as we did with gleak. The only thing at the moment is a slight gut feeling that fdooze might need to undergo a few iterations first before integrating it into Gomega.

I have no macos at hand but I've seen proc_pidinfo being mentioned for macos. However, this requires some refactoring of the existing code where macos and linux could maybe share a few things, such as util functions. But the macos implementation would probably differ in following proc_pidinfo's design of how to represent data about individual fds.

onsi commented 2 years ago

Since you started the new process under the same UID/GID and didn't something special the file descriptor information of that started process would be accessible. Somewhere the session should give me the PID and then it's the same (on Linux) regardless of looking at my own process using the /proc/self symlink or some other process /proc/$PID. Looks to be a straightforward extension without much effort. The most difficult part will be to write the new unit tests to simulate a leaky started process.

make sense - sounds like something that could happen in a future iteration if there's demand for it. The session wraps the exec.Command and from there one can get the PID without too much difficulty I believe.

I'm open to also integrating as we did with gleak. The only thing at the moment is a slight gut feeling that fdooze might need to undergo a few iterations first before integrating it into Gomega.

Sounds good. How about a PR to update the README and the index.md doc file to advertise fdooze?

I have no macos at hand but I've seen proc_pidinfo being mentioned for macos. However, this requires some refactoring of the existing code where macos and linux could maybe share a few things, such as util functions. But the macos implementation would probably differ in following proc_pidinfo's design of how to represent data about individual fds.

I think lsof would get one most of the way there. But I'm no expert and the feature would probably benefit from sone with deeper macOS knowledge working on it. As it stands we can allow demand (and an enterprising developer's PR) to drive development ;)

thediveo commented 2 years ago

Sounds good. How about a PR to update the README and the index.md doc file to advertise fdooze?

As for the README how do you would like it? Very short new ## heading about experimental new leakage test packages and asking for feedback?

thediveo commented 2 years ago

@onsi what would you prefer for FiledescriptorsFor() to do when there is a problem with the session, such as the process isn't valid anymore or isn't accessible from the calling test...

  1. to panic
  2. return (nil, error)
  3. be silent and simply return nil? That's probably the variant we never want because it would fail silently ... which is especially bad in tests.
onsi commented 2 years ago

IMP it should return nil, error and everything downstream (and/or the user) should check for an error. Any downsides to that?

thediveo commented 2 years ago

No, just wanted to make sure that I'm getting this right. Thank you very much! One thing that I noticed already when leak-testing another go-based target: make sure to get the "good fd set" only after Go's stdlib netpoll has created the eventpoll fd as well as an internal pipe to break out of epoll_waits...

Unfortunately, there's no reliable way to automatically filter these out as we don't have the information which code created a particular fd. That would require hardcore tools, such as kernel eBPF (sich!) hooks into certain syscalls...

thediveo commented 2 years ago

Now for the hard question: which package name to use when integrating the fd leak detection into Gomega? Based on urban dict results I doubt gooze would be a good idea. gspill maybe? I'm open to suggestions!

thediveo commented 2 years ago

@onsi ping ... what would your suggestion as to the package name for the fd leak testing package?

onsi commented 2 years ago

sorry I've been so behind! A boring and unpronounceable idea is gfdleak

I kinda like gspill though - we could go with that?

thediveo commented 2 years ago

gspill is perfectly fine with me!