vrtmrz / livesync-bridge

44 stars 6 forks source link

Too many open files (os error 24) #14

Open Laharah opened 7 months ago

Laharah commented 7 months ago

Livesync-bridge crashes when there are 89-90 or more files.

This is the error I'm getting:

LiveSync Bridge is now starting...
LiveSync Bridge is now started!
4/19/2024, 12:06:56 AM  1       Cache initialized 300 / 10000000000000
4/19/2024, 12:06:56 AM  1       Cache initialized 300 / 50000000
4/19/2024, 12:06:56 AM  1       Cache initialized 300 / 10000000000000
4/19/2024, 12:06:56 AM  -1      Requesting ... get http://localhost:5984/obsdb/_local%2Fobsydian_livesync_milestone
4/19/2024, 12:06:56 AM  10      [server-local-storage] Scan offline changes: Disabled
error: Uncaught Error: Too many open files (os error 24)
    at new FsWatcher (ext:runtime/40_fs_events.js:31:17)
    at Object.watchFs (ext:runtime/40_fs_events.js:93:10)
    at ext:deno_node/_fs/_fs_watch.ts:51:21
    at eventLoopTick (ext:core/01_core.js:203:13)

I wrote a script to delete files until the program didn't crash and about 90 files seems to be the number at which the program crashes. If there are less than 90 files in the storage baseDir, it will download files until it reaches 90 and then crashes.

My current setup is a linux machine, trying to replicate to a folder on the same server. I'm not running in a docker instance, just directly from my terminal.

I did the obvious things first. I've set my file handle limit all the way up:

> ulimit -n
65536

And just to double check:

> lsof -u laharah | wc -l
4795

Here's the current config I'm using:

{
    "peers": [
        {
            "type": "couchdb",
            "name": "server-database",
            "database": "obsdb",
            "url": "http://localhost:5984",
            "username": "obsidian",
            "password": "[REDACTED]",
            "baseDir": ""
        },
        {
            "type": "storage",
            "name": "server-local-storage",
            "scanOfflineChanges": false,
            "baseDir": "/home/laharah/test_obs"
        }
    ]
}

Here also is the relevent portion of the stack trace where the error originates.

15370 <... sched_yield resumed> )       = 0
15521 sigaltstack({ss_sp=0x7f49a00d7000, ss_flags=0, ss_size=8192},  <unfinished ...>
15370 sched_yield( <unfinished ...>
15521 <... sigaltstack resumed> NULL)   = 0
15370 <... sched_yield resumed> )       = 0
15370 sched_yield( <unfinished ...>
15521 prctl(PR_SET_NAME, "notify-rs inoti"... <unfinished ...>
15370 <... sched_yield resumed> )       = 0
15521 <... prctl resumed> )             = 0
15370 futex(0x55c6e6745998, FUTEX_WAIT_BITSET_PRIVATE, 4294967295, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
15521 mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f4750000000
15521 munmap(0x7f4754000000, 67108864)  = 0
15521 mprotect(0x7f4750000000, 135168, PROT_READ|PROT_WRITE) = 0
15521 sched_getaffinity(15521, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]) = 32
15521 epoll_wait(326, [{EPOLLIN, {u32=1, u64=1}}], 16, -1) = 1
15521 inotify_add_watch(325, "/home/laharah/test_obs/.blog/_assets/js", IN_MODIFY|IN_ATTRIB|IN_CL
OSE_WRITE|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF) = 1
15521 futex(0x55c6e6745998, FUTEX_WAKE_PRIVATE, 1) = 1
15370 <... futex resumed> )             = 0
15521 epoll_wait(326,  <unfinished ...>
15370 inotify_init()                    = -1 EMFILE (Too many open files)
15370 brk(0x55c6e6bdc000)               = 0x55c6e6bdc000
15370 gettid()                          = 15370
15370 mmap(0x14cc10200000, 528384, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x14cc10200000
15370 munmap(0x14cc10242000, 258048)    = 0
15370 mprotect(0x14cc10200000, 270336, PROT_READ|PROT_WRITE) = 0
15370 epoll_ctl(5, EPOLL_CTL_DEL, 321, NULL) = 0
15370 close(321)                        = 0
15370 stat("/home/laharah/.cache/deno/location_data/398eebd359bc9e171ffbcfd87c28f16aefe4ea41490b8d61bf1b53cb2cf1c064/local_storage", {st_mode=S_IFREG|0644, st_size=12288, ...}) = 0
15370 fcntl(14, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=1073741826, l_len=510}) = 0 

My current best guess is that chokidar has some kind of error. Reading into some issues it looks like chokidar is supposed to combine inotify calls to reduce open handles, but it doesn't seem to be working for some reason. My TS isn't good enough for me to implement a workaround, and skimming the code, nothing jumps out at me as a cause.

I suspect that there's something wrong with my server's configuration, since I'd imagine that I'd see another issue from someone else by now.

Please let me know if there's anything you can think of. I'm planning to try running livesync-bridge in a docker container, just to check, but I don't think it'll help since the underlying volume will be served from my filesystem. If the error originates below the container, it shouldn't matter if it's containerized or not. Still I'll give it a try in the morning just in case.

ippaveln commented 6 months ago

I have the same problem, but in Docker on Linux

wujiyu115 commented 6 months ago

I have the same problem in docker too Download https://registry.npmjs.org/is-number/-/is-number-7.0.0.tgz error: Uncaught Error: Too many open files (os error 24) at new FsWatcher (ext:runtime/40_fs_events.js:23:17) at Object.watchFs (ext:runtime/40_fs_events.js:76:10) at ext:deno_node/_fs/_fs_watch.ts:58:21 at Object.action (ext:deno_web/02_timers.js:154:11) at handleTimerMacrotask (ext:deno_web/02_timers.js:68:10) at eventLoopTick (ext:core/01_core.js:160:21)

Laharah commented 5 months ago

Managed to implement a workaround for this. While we wait for the PR to be accepted, you can clone the branch I made and then add this option to the end of your config.json in your storage peer:

{
  "type": "storage"
  \\ ...,
  "usePolling": true
}

If you try it out and run into an issue with it LMK.