yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.38k stars 427 forks source link

\DATA\INDEX\freeworld\SEGMENTS\default increases memory consumption radically #581

Open okybaca opened 1 year ago

okybaca commented 1 year ago

For a long time, I didn't understand the role of ‘\DATA\INDEX\freeworld\SEGMENTS\default*.’. As I was crawling, the size of ‘\DATA\INDEX\freeworld\SEGMENTS\default.*’ was growing constantly, although the index is kept in solr. Start-up was slower and slower, as system was somehow dealing with all these files (merging them, compressing etc.) and every approximately 1 000 000 pages I had to increase RAM for 1 GB (otherwise yacy wouldn't start) as these files were probably kept in memory. Yesterday I had to delete local RWIs (Index administration>URL database administration>Cleanup>Delete RWI Index (DHT transmission words)) and all of the files disappeared, startup started to be quicker and memory footprint decreased radically. So my theory is, these files are RWIs ready to be sent to other peers. Very poorly (at least expensively) managed and slowly degrading the performance of an instance. Sviatoslav even published a script to delete these files periodicaly, regarding them as unnecessary. Although I understand that RWIs are necessary for P2P network, I believe they could be managed in some smarter way, at least not degrading the performance and ocupying wast amount of RAM.

okybaca commented 9 months ago

OK, from that point I tried to understand the RWI system and wrote a proposal for a part of documentation. Could anyone take a look, please? There are some questions still... It can be a part of "docs" afterwards...

That doesn't change the nature of RWI as still resources-hungry thing, and I hope that could be solved.