quambene / bogrep

Grep your bookmarks
Apache License 2.0
100 stars 2 forks source link

Fetch error: Too many open files #59

Open mattmartini opened 9 months ago

mattmartini commented 9 months ago

When trying to do a bogrep fetch I am getting the error below.

$ bogrep fetch Error: Can't create file at /Users/USERNAME/Library/Application Support/bogrep/cache/78aa542f-52c1-4b5e-b475-15293854996a.txt: Too many open files (os error 24) $ (140/8005)

I tried setting "max_concurrent_requests": 50, and still get this issue.

OS: Darwin 23.1.0 - macOS 14.1.1 (Sonoma) version: bogrep 0.5.0

quambene commented 9 months ago

Thanks, seems that writing to the file system needs to be limited as well (besides limiting the number of open network connections)

quambene commented 9 months ago

On linux systems, ulimit -n shows the number of open files allowed (see https://ss64.com/bash/ulimit.html). Often, the default is 1024. Could you check what's the value on your OS?

mattmartini commented 9 months ago

$ ulimit -n 256

On Nov 28, 2023, at 11:01 PM, quambene @.***> wrote:

On linux systems, ulimit -n shows the number of open files allowed (see https://ss64.com/bash/ulimit.html). Often, the default is 1024. Could you check what's the value on your OS?

— Reply to this email directly, view it on GitHub https://github.com/quambene/bogrep/issues/59#issuecomment-1831180559, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAM6OQCMJTVZOBZ2NG74LLYG2XRBAVCNFSM6AAAAAA76RAQY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZRGE4DANJVHE. You are receiving this because you authored the thread.

mattmartini commented 9 months ago

Above fix did not solve the problem.

$ bogrep init
Imported 8007 bookmarks from 2 sources: /Users/USERNAME/Library/Application Support/Firefox/Profiles/8cycwilz.Everyday_Usage/bookmarkbackups/bookmarks-2023-11-29_9410_DCFJZCTgp91yEKP+oZsoDA==.jsonlz4, /Users/USERNAME/Library/Application Support/Google/Chrome/Default/Bookmarks
Error: Can't create file at /Users/USERNAME/Library/Application Support/bogrep/cache/97d40634-be8c-4332-987b-6bb9f576d2e4.txt: Too many open files (os error 24)
$ (197/8007)
quambene commented 9 months ago

sorry, it's not fixed yet. I will let you know when it's ready

quambene commented 9 months ago

Fixed on main branch. Will prepare release 0.6.0.

I set the default for max_idle_connections_per_host in settings.json from 100 to the more sensible 10. Idle connections were stuck in the connection pool.

Idle connections will be removed after 5 seconds (see idle_connections_timeout in settings.json).

See the documentation: https://docs.rs/bogrep/latest/bogrep/struct.Settings.html#fields

Please remove your bogrep config folder with the old settings.json before running bogrep.

If you still get a "Too many open files" error, try to decrease max_idle_connections_per_host to 5, and I will update the default accordingly.

mattmartini commented 9 months ago
"max_concurrent_requests": 10,
"max_idle_connections_per_host": 5,

Seems to work (slowly). However got this error (I believe this is a chrome bookmark)

Error: Can't get host for url: javascript:void(location.href='http://tinyurl.com/create.php?url='+encodeURIComponent(location.href))
quambene commented 9 months ago

Pushed an improvement to main branch: https://github.com/quambene/bogrep/pull/63

Fetching will not be aborted any more if an expected error occurs, so you should be able to finish processing (but with a few warnings instead).

You can try to set max_concurrent_requests to 100 again.

There is still something odd with macOS which I have to investigate. On Ubuntu, I'm able to fetch 500 concurrent requests, and it's finished quickly.

mattmartini commented 9 months ago

Bumped max_concurrent_requests back up to 100. Still getting many Too many open files errors. Some for creating the cache file, and some for fetching a website.

Only 2783 cache files created out of 8007 bookmarks. This seems very low, sure there are probably many dead links in the bookmarks, but not 71%.

[2023-11-30T23:37:34Z WARN bogrep::cmd::fetch] Can't create file at /Users/USERNAME/Library/Application Support/bogrep/cache/720450d1-9108-4e76-a2ff-1b1431cca0c5.txt: Too many open files (os error 24)

[2023-11-30T23:37:34Z WARN bogrep::cmd::fetch] Can't fetch website: error sending request for url (http://www.j.nurick.dial.pipex.com/Code/Perl/index.htm): error trying to connect: dns error: proto error: io error: Too many open files (os error 24)

[2023-11-30T23:36:44Z WARN bogrep::cmd::fetch] Can't fetch website: error sending request for url (http://support.apple.com/kb/HT1159#mac_pro): error trying to connect: dns error: proto error: io error: Too many open files (os error 24)

Dropped max_concurrent_requests to 20. Much slower ( 15:58 min to try and fetch 8007 URLs), but only got warnings for javascript: bookmarks.

[2023-11-30T23:50:11Z WARN bogrep::cmd::fetch] Can't get host for url: javascript:void(location.href='http://tinyurl.com/create.php?url='+encodeURIComponent(location.href))

Retrieved 5447 out of 8007 bookmarks.

On another note, you should do point releases like v0.6.1 ;-) Replaced package `bogrep v0.6.0 (/Users/USERNAME/Projects/BookMarks/bogrep)` with `bogrep v0.6.0 (/Users/USERNAME/Projects/BookMarks/bogrep)` (executable `bogrep`)

quambene commented 9 months ago

Thanks for checking!

71% error rate is indeed too much and is explained by the "Too many open files" errors which prevents fetching and caching.

I would have expected that 100 concurrent requests would work without issues though.

For example, on Ubuntu I was fetching 500 concurrent requests successfully which is explained by:

500 open files + 500 open connections = 1000 linux sockets < 1024

where 1024 is the limit for open files on Ubuntu.

The same calculation doesn't seem to work for macOS, where we have:

100 open files + 100 open connections = 200 sockets < 256

I will dig a bit more why the expected 100 for max_concurrent_requests is not working on macOS.

Unfortunately, most releases include breaking changes, that's why I'm increasing the minor version. Next release includes a bugfix without breaking changes though and will be v0.6.1 :)