Open 4nanook opened 1 year ago
Since it won't initialize successfully, I can't tweak memory settings from a browser.
I really would appreciate some assistance getting this up and running again to help provide an alternative to evil Google. If it won't run with 24GB of RAM, don't know what to do. This seems like quite unreasonable memory requirements. When I say 24GB I just mean the memory allocated to java to run yacy, system has 56GB of RAM for the web server which is shared with yacy and some other web apps like friendica, hubzilla, nextcloud, squirrelmail and other things.
how large was the index before this happened?
/yacy/DATA# du -sh INDEX 430G INDEX
I was kind of hoping for a solution?
I guess yacy is dead.
Ok, I deleted the index and started over. Now, how can I prevent this situation from recurring?
Also I seem to be having difficulty with the upload seed bit, I wish somewhere there were a step-by-step configuration guide to this thing. I realize a lot of programmers don't like to document but it is frustrating.
Got the upload seed.txt working, protocols available in the download included ftp, but a url of ftp:// file doesn't work, had to use https.
I wish there were something LIKE yacy but written in a usable language like C and maintained.
Maybe you can try to change memory.standardStrategy=true to false in yacy/DATA/SETTINGS/yacy.conf ?
I guess yacy is dead.
that is not true
The commit history speaks otherwise.
Download latest release from here https://release.yacy.net/
The commit history speaks otherwise.
one week of no push is death for you?
Back to the Issue, I got similar problem -- as I crawl, Kelondro database in \DATA\INDEX\freeworld\SEGMENTS\default* grows and the amount of ocupied memory increases, as described in #581. When it reaches the point when no more RAM is reserved for YaCy, it starts to throw the exceptions and YaCy stucks. I believe there is some major inefficiency in storing the data in Kelondro, wasting a lots of RAM. It costs a lot of disk I/O as well.
When all you have is a hammer:
while :; do jattach $(pidof java) jcmd GC.run;sleep 20;done
This forces a garbage collect every 20 seconds (replace 20
with the desired interval). It assumes that there is only a single java process running on the system, if you have more than one replace $(pidof java)
with something more advanced, e.g. $(ps aux|awk '/yacy/ {print $2})
(which in turn assumes there is only a single process either run by user yacy or which has yacy anywhere in its command line).
Before using this crude workaround I'd wake up to a frozen yacy process due to memory starvation. Now I wake up to a yacy process complaining about a lack of disk space because it has crawled all night and then some. The latter is easy to fix since I run the thing in a container under Proxmox so I can just add another 50 GB and restart the process.
You'll need jattach
(sudo apt install jattach if you run Debian or something similar).
Running a parallel crawl on 6 sites, 2400 MB reserved memory forced GC every 20 seconds.
thanks, @Yetangitu !
I think requiring such a ridiculous and gross hack qualifies as either dead or should be dead. I was trying to run this in a freebsd jail on truenas core, kind of limped along with the apparently usual throw more disk at it, throw more ram at it, try to set limits in the absolutely inscrutable settings, still have it bomb... I think I'm done. Yeah, I'm done.
I got running again by deleting the index, just nuclear option rm -rf freeworld and restart the jail. Now it's running again but clearly it will just happen again. I can think of no excuse for a service to actually fail to start at all no matter how bad it's data is. Bad config, sure, but data? There is no excuse for it failing to come up to a usable ui that says "the data is bad".
I'm seeing the same thing here, just using YaCy to index my personal website (and implement search for it). It's kind of absurd, seeing it get OoM killed every other day for a personal site's search.
In my experience, the usual culprit is RWI Kelondro database, which I tried to describe (examining the black-box) here: https://eldar.cz/yacydoc/operation/rwi-index-distribution.html.
Deleting DATA/INDEX/freeworld/SEGMENTS/default/text.index.*
usualy solved
memory issues for me.
> "Sometimes the RWIs even fill out all the RAM allocated for YaCy, resulting in frequent GC (a Java process of cleaning the memory, performance expenssive) and, later on, even in filling the RAM in a way that instance would break. The cure is easy then: just delete the RWI files and start YaCy again.
> The question is, if RWIs are implemented efficiently and whether they couldn't be implemented in a way, that doesn't degrade the performance of local peer in such huge degree."
So my theory is poor memory management in RWI/kelondro implementation. Since RWI is the root of YaCy p2p index distribution, it's not desiderable to disable RWI completely. Memory tweaks in Kelondro code https://github.com/yacy/yacy_search_server/tree/master/source/net/yacy/kelondro would IMHO most probably help to solve the memory issues.
See issue #581.
Try this setting? I am running 3 servers on same box. Has got to do with lower part of memory.
sudo nano /etc/sysctl.conf
vm.max_map_count=262144
sudo sysctl -p
Why Set It to 262144? This value is commonly required for high-performance applications like Elasticsearch or container orchestration tools because they use a large number of memory-mapped files. Increasing the limit to 262144 ensures these applications can handle more memory maps without running into system constraints.
I have since re-installed on a new machine and it is no longer having
this issue.
---------------------------------------_- Eskimo North Linux Friendly Internet Access, Shell Accounts, and Hosting. Knowledgeable human assistance, not telephone trees or script readers. See our web site: http://www.eskimo.com/ (206) 812-0051 or (800) 246-6874.
On Fri, 13 Sep 2024, smokingwheels wrote:
Date: Fri, 13 Sep 2024 05:12:23 -0700 From: smokingwheels @.> Reply-To: yacy/yacy_search_server @.> To: yacy/yacy_search_server @.> Cc: Robert Dinse @.>, Author @.***> Subject: Re: [yacy/yacy_search_server] Running out of Heap space even with JVM memory set to 24GB (Issue #552)
Try this setting? I am running 3 servers on same box. Has got to do with lower part of memory.
sudo nano /etc/sysctl.conf
vm.max_map_count=262144
sudo sysctl -p
Why Set It to 262144? This value is commonly required for high-performance applications like Elasticsearch or container orchestration tools because they use a large number of memory-mapped files. Increasing the limit to 262144 ensures these applications can handle more memory maps without running into system constraints.
-- Reply to this email directly or view it on GitHub: https://github.com/yacy/yacy_search_server/issues/552#issuecomment-2348808239 You are receiving this because you authored the thread.
Message ID: @.***>
I 2023/01/23 15:02:47 HeapReader generating index for /yacy/DATA/INDEX/freeworld/SEGMENTS/default/citation.index.20221108095009098.blob, 5397 MB. Please wait. W 2023/01/23 15:04:35 ConcurrentLog net.yacy.cora.util.SpaceExceededException: 50658600 bytes needed for RowCollection grow after OutOfMemoryError Java heap space: 568488840 free at Mon Jan 23 15:04:35 PST 2023 net.yacy.cora.util.SpaceExceededException: 50658600 bytes needed for RowCollection grow after OutOfMemoryError Java heap space: 568488840 free at Mon Jan 23 15:04:35 PST 2023 at net.yacy.kelondro.index.RowCollection.ensureSize(RowCollection.java:276) at net.yacy.kelondro.index.RowCollection.addUnique(RowCollection.java:425) at net.yacy.kelondro.index.RowCollection.addUnique(RowCollection.java:403) at net.yacy.kelondro.index.RAMIndex.addUnique(RAMIndex.java:216) at net.yacy.kelondro.index.RAMIndexCluster.addUnique(RAMIndexCluster.java:133) at net.yacy.kelondro.index.RowHandleMap.putUnique(RowHandleMap.java:292) at net.yacy.kelondro.index.RowHandleMap$initDataConsumer.call(RowHandleMap.java:497) at net.yacy.kelondro.index.RowHandleMap$initDataConsumer.call(RowHandleMap.java:436) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) I 2023/01/23 15:04:46 MEMORY performed explicit GC, freed 70944 KB (requested/available/average: 263610 / 327103 / 0 KB) I 2023/01/23 15:05:16 ConcurrentLog shutdown of ConcurrentLog.Worker void because it was not running. E 2023/01/23 15:05:16 UNCAUGHT-EXCEPTION Thread main: Java heap space java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space E 2023/01/23 15:05:16 ConcurrentLog Java heap space java.lang.OutOfMemoryError: Java heap space
This is on a Intel based machine running Ubuntu 22.04 Linux, 6.1.7 kernel.