yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.41k stars 426 forks source link

Running out of Heap space even with JVM memory set to 24GB #552

Open 4nanook opened 1 year ago

4nanook commented 1 year ago

I 2023/01/23 15:02:47 HeapReader generating index for /yacy/DATA/INDEX/freeworld/SEGMENTS/default/citation.index.20221108095009098.blob, 5397 MB. Please wait. W 2023/01/23 15:04:35 ConcurrentLog net.yacy.cora.util.SpaceExceededException: 50658600 bytes needed for RowCollection grow after OutOfMemoryError Java heap space: 568488840 free at Mon Jan 23 15:04:35 PST 2023 net.yacy.cora.util.SpaceExceededException: 50658600 bytes needed for RowCollection grow after OutOfMemoryError Java heap space: 568488840 free at Mon Jan 23 15:04:35 PST 2023 at net.yacy.kelondro.index.RowCollection.ensureSize(RowCollection.java:276) at net.yacy.kelondro.index.RowCollection.addUnique(RowCollection.java:425) at net.yacy.kelondro.index.RowCollection.addUnique(RowCollection.java:403) at net.yacy.kelondro.index.RAMIndex.addUnique(RAMIndex.java:216) at net.yacy.kelondro.index.RAMIndexCluster.addUnique(RAMIndexCluster.java:133) at net.yacy.kelondro.index.RowHandleMap.putUnique(RowHandleMap.java:292) at net.yacy.kelondro.index.RowHandleMap$initDataConsumer.call(RowHandleMap.java:497) at net.yacy.kelondro.index.RowHandleMap$initDataConsumer.call(RowHandleMap.java:436) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) I 2023/01/23 15:04:46 MEMORY performed explicit GC, freed 70944 KB (requested/available/average: 263610 / 327103 / 0 KB) I 2023/01/23 15:05:16 ConcurrentLog shutdown of ConcurrentLog.Worker void because it was not running. E 2023/01/23 15:05:16 UNCAUGHT-EXCEPTION Thread main: Java heap space java.lang.OutOfMemoryError: Java heap space

java.lang.OutOfMemoryError: Java heap space E 2023/01/23 15:05:16 ConcurrentLog Java heap space java.lang.OutOfMemoryError: Java heap space

This is on a Intel based machine running Ubuntu 22.04 Linux, 6.1.7 kernel.

4nanook commented 1 year ago

Since it won't initialize successfully, I can't tweak memory settings from a browser.

4nanook commented 1 year ago

I really would appreciate some assistance getting this up and running again to help provide an alternative to evil Google. If it won't run with 24GB of RAM, don't know what to do. This seems like quite unreasonable memory requirements. When I say 24GB I just mean the memory allocated to java to run yacy, system has 56GB of RAM for the web server which is shared with yacy and some other web apps like friendica, hubzilla, nextcloud, squirrelmail and other things.

Orbiter commented 1 year ago

how large was the index before this happened?

4nanook commented 1 year ago

/yacy/DATA# du -sh INDEX 430G INDEX

4nanook commented 1 year ago

I was kind of hoping for a solution?

4nanook commented 1 year ago

I guess yacy is dead.

4nanook commented 1 year ago

Ok, I deleted the index and started over. Now, how can I prevent this situation from recurring?

4nanook commented 1 year ago

Also I seem to be having difficulty with the upload seed bit, I wish somewhere there were a step-by-step configuration guide to this thing. I realize a lot of programmers don't like to document but it is frustrating.

4nanook commented 1 year ago

Got the upload seed.txt working, protocols available in the download included ftp, but a url of ftp:// file doesn't work, had to use https.

4nanook commented 1 year ago

I wish there were something LIKE yacy but written in a usable language like C and maintained.

kotenok2000 commented 1 year ago

Maybe you can try to change memory.standardStrategy=true to false in yacy/DATA/SETTINGS/yacy.conf ?

frankenstein91 commented 1 year ago

I guess yacy is dead.

that is not true

virtadpt commented 1 year ago

The commit history speaks otherwise.

kotenok2000 commented 1 year ago

Download latest release from here https://release.yacy.net/

frankenstein91 commented 1 year ago

The commit history speaks otherwise.

one week of no push is death for you?

okybaca commented 1 year ago

Back to the Issue, I got similar problem -- as I crawl, Kelondro database in \DATA\INDEX\freeworld\SEGMENTS\default* grows and the amount of ocupied memory increases, as described in #581. When it reaches the point when no more RAM is reserved for YaCy, it starts to throw the exceptions and YaCy stucks. I believe there is some major inefficiency in storing the data in Kelondro, wasting a lots of RAM. It costs a lot of disk I/O as well.

Yetangitu commented 7 months ago

When all you have is a hammer:

while :; do jattach $(pidof java) jcmd GC.run;sleep 20;done

This forces a garbage collect every 20 seconds (replace 20 with the desired interval). It assumes that there is only a single java process running on the system, if you have more than one replace $(pidof java) with something more advanced, e.g. $(ps aux|awk '/yacy/ {print $2}) (which in turn assumes there is only a single process either run by user yacy or which has yacy anywhere in its command line).

Before using this crude workaround I'd wake up to a frozen yacy process due to memory starvation. Now I wake up to a yacy process complaining about a lack of disk space because it has crawled all night and then some. The latter is easy to fix since I run the thing in a container under Proxmox so I can just add another 50 GB and restart the process.

You'll need jattach (sudo apt install jattach if you run Debian or something similar).

Yetangitu commented 7 months ago

image

Running a parallel crawl on 6 sites, 2400 MB reserved memory forced GC every 20 seconds.

okybaca commented 7 months ago

thanks, @Yetangitu !

bkw777 commented 2 months ago

I think requiring such a ridiculous and gross hack qualifies as either dead or should be dead. I was trying to run this in a freebsd jail on truenas core, kind of limped along with the apparently usual throw more disk at it, throw more ram at it, try to set limits in the absolutely inscrutable settings, still have it bomb... I think I'm done. Yeah, I'm done.

bkw777 commented 2 months ago

I got running again by deleting the index, just nuclear option rm -rf freeworld and restart the jail. Now it's running again but clearly it will just happen again. I can think of no excuse for a service to actually fail to start at all no matter how bad it's data is. Bad config, sure, but data? There is no excuse for it failing to come up to a usable ui that says "the data is bad".

virtadpt commented 2 months ago

I'm seeing the same thing here, just using YaCy to index my personal website (and implement search for it). It's kind of absurd, seeing it get OoM killed every other day for a personal site's search.

okybaca commented 2 months ago

In my experience, the usual culprit is RWI Kelondro database, which I tried to describe (examining the black-box) here: https://eldar.cz/yacydoc/operation/rwi-index-distribution.html.

Deleting DATA/INDEX/freeworld/SEGMENTS/default/text.index.* usualy solved memory issues for me.

> "Sometimes the RWIs even fill out all the RAM allocated for YaCy, resulting in frequent GC (a Java process of cleaning the memory, performance expenssive) and, later on, even in filling the RAM in a way that instance would break. The cure is easy then: just delete the RWI files and start YaCy again.
> The question is, if RWIs are implemented efficiently and whether they couldn't be implemented in a way, that doesn't degrade the performance of local peer in such huge degree."

So my theory is poor memory management in RWI/kelondro implementation. Since RWI is the root of YaCy p2p index distribution, it's not desiderable to disable RWI completely. Memory tweaks in Kelondro code https://github.com/yacy/yacy_search_server/tree/master/source/net/yacy/kelondro would IMHO most probably help to solve the memory issues.

See issue #581.

smokingwheels commented 1 month ago

Try this setting? I am running 3 servers on same box. Has got to do with lower part of memory.

sudo nano /etc/sysctl.conf

vm.max_map_count=262144

sudo sysctl -p

Why Set It to 262144? This value is commonly required for high-performance applications like Elasticsearch or container orchestration tools because they use a large number of memory-mapped files. Increasing the limit to 262144 ensures these applications can handle more memory maps without running into system constraints.

4nanook commented 1 month ago
  I have since re-installed on a new machine and it is no longer having

this issue.

---------------------------------------_- Eskimo North Linux Friendly Internet Access, Shell Accounts, and Hosting. Knowledgeable human assistance, not telephone trees or script readers. See our web site: http://www.eskimo.com/ (206) 812-0051 or (800) 246-6874.

On Fri, 13 Sep 2024, smokingwheels wrote:

Date: Fri, 13 Sep 2024 05:12:23 -0700 From: smokingwheels @.> Reply-To: yacy/yacy_search_server @.> To: yacy/yacy_search_server @.> Cc: Robert Dinse @.>, Author @.***> Subject: Re: [yacy/yacy_search_server] Running out of Heap space even with JVM memory set to 24GB (Issue #552)

Try this setting? I am running 3 servers on same box. Has got to do with lower part of memory.

sudo nano /etc/sysctl.conf

vm.max_map_count=262144

sudo sysctl -p

Why Set It to 262144? This value is commonly required for high-performance applications like Elasticsearch or container orchestration tools because they use a large number of memory-mapped files. Increasing the limit to 262144 ensures these applications can handle more memory maps without running into system constraints.

-- Reply to this email directly or view it on GitHub: https://github.com/yacy/yacy_search_server/issues/552#issuecomment-2348808239 You are receiving this because you authored the thread.

Message ID: @.***>