schell / steeloverseer

A file watcher and development tool.
BSD 3-Clause "New" or "Revised" License
128 stars 15 forks source link

sos: addWatch: resource exhausted (No space left on device) #34

Open stites opened 6 years ago

stites commented 6 years ago

When running sos on Ubuntu-17.10 (x64), I occasionally get sos: addWatch: resource exhausted (No space left on device) at first I thought this was only when I was in tmux, but it looks like it happens in some non-deterministic fashion. Memory usage is <25% utilization and disk space is ~66% -- so I'm not sure what kind of space is being referred to in this error. I've never had this problem on Archlinux or Debian (so far).

If someone could point me in a direction, I can try to fix this myself (I imagine it is a bit difficult to simulate).

stites commented 6 years ago

It looks like rebooting will clear up this issue -- is it possible that steeloverseer is missing a clean-up step?

schell commented 6 years ago

In my experience when linux says "no space left on device" - "device" means "disk". I am by no means a linux expert though. Are you running sos on a separate filesystem? Maybe in a dropbox folder or a mounted drive? Just spit balling. :)

aboutthomas commented 6 years ago

Possibly relevant: https://github.com/google/cadvisor/issues/1581#issuecomment-367616070.

stites commented 6 years ago

@sjakobi-as My internal simulator wouldn't be surprised if this was the culprit: perhaps when you run steeloverseer in tmux, watches aren't evicted as you would expect because of the top-level process.

I've been avoiding using sos because of this bug. I'll switch back and report what I find.

stites commented 5 years ago

Ah! I think I found the bug. I was using sos and, upon termination (via ctrl-c), saw this:

^CError removing watch: <wd=132>
Error removing watch: <wd=131>
Error removing watch: <wd=130>
Error removing watch: <wd=129>
Error removing watch: <wd=128>
Error removing watch: <wd=127>
Error removing watch: <wd=126>
Error removing watch: <wd=125>
Error removing watch: <wd=124>
Error removing watch: <wd=123>
Error removing watch: <wd=122>
Error removing watch: <wd=121>
Error removing watch: <wd=120>
Error removing watch: <wd=34>

I don't have time to make the fix at the moment, but I think this behavior confirms @sjakobi-as' observation. I'll try to lock down some test cases.

schell commented 5 years ago

Thanks for your diligence @stites :)

schell commented 5 years ago

Try providing the directory of the sources (code) you'd like to watch, this will limit soss watching behavior. Seems to work for me.

stites commented 5 years ago

No need! This is reproducible on two of my machines, but the bug is that fsnotify is swallowing errors thrown by hinotify. See https://github.com/haskell-fswatch/hfsnotify/issues/85