yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.39k stars 427 forks source link

Crawl depth of 10 stops yacy from opening navigation after 10 mins error too many files open. #494

Closed smokingwheels closed 2 years ago

smokingwheels commented 2 years ago

I was crawling a few sites at once. I have done each site at a depth of 5 no problems. There was 32000 folders in the crawler que waiting, I deleted them and restarted ok. The command to stop did not work when in error. used pkill -9 java. ubuntu 18.04

https://open.spotify.com/ https://www.spotify.com/au/premium/ https://www.woolworths.com.au/shop/productdetails/381207/palmolive-naturals-body-wash-milk-honey-shower-gel?utm_source=Spotify&utm_medium=soc&utm_campaign=PalmoliveEquity&utm_content=CON-NA-BroadAwareness-Audio-30s-NA&dclid=CjgKEAjw0dKXBhCtieWRmLPRgC0SJABhlCFUSR1Mq_eVRqwMUL7lNWyab8sxV8GBp1LZCBagUNId1fD_BwE http://www.ldvautomotive.com.au/ https://www.homeloans.com.au/?utm_source=spotify&utm_medium=display&utm_campaign=july2022 http://www.anytimefitness.com.au/ http://adstudio.spotify.com/ https://www.youi.com.au/?src1=307750&utm_source=Digital&utm_medium=banner&utm_term=Spotify&utm_content=Audio&utm_campaign=Consideration&src3=digital&dclid=CjgKEAjw0dKXBhCtieWRmLPRgC0SJABhlCFUYjw8FCg9fTcchyfVkIVDwcr_LKT15Kqdqb87kYKJ0fD_BwE https://auspost.com.au/about-us/supporting-communities/literacy-education?cid=aud:4470198:con:DeadlyScience:342343323:175816035 https://auspost.com.au/about-us https://www.cancer.org.au/bowelscreening/?utm_source=Spotify&utm_medium=digitalaudio&utm_content=da_in_30_bw&utm_campaign=NBSC22 https://www.twusuper.com.au/insurance/hazardous-occupations/?utm_source=spotify&utm_medium=cpc&utm_campaign=insurance https://topsify.com/au https://heyscape.com.au/ https://palmerbet.onelink.me/JSzX/ac0041fc https://adeventtracker.spotify.com/har?har_ios=&har_android=&har_default= https://pixel.adsafeprotected.com/rfw/st/907552/60683787/skeleton.gif https://static.adsafeprotected.com/skeleton.gif

frankenstein91 commented 2 years ago

Hi can you provide the lines grom the log and some information on your OS, JavaVersion and hardware?

virtadpt commented 2 years ago

I think you're really overloading YaCy with a crawl depth of 10. From the YaCy administration panel, Advanced Crawler, there is a pop-up notice that says the following:

"2-4 is good for normal indexing. Values over 8 are not useful, since a depth-8 crawl will index approximately 25.600.000.000 pages, maybe this is the whole WWW."

smokingwheels commented 2 years ago

@frankenstein91 Old I7 12 GB SSD 256 GB Ubuntu 18.04 Java 11 Usually the latest version of yacy from github. Sorry I done have log file but could re run the test in a few days time as a test in windows 11. I have my own separate YaCy on github now see https://github.com/smokingwheels/YaCy/ Where I can try different versions of Java in Windows. @virtadpt Yes I agree it was to much of and overload I don't really know much about it what happens when I do something like that. There was about 30 000 folders in the crawl que folder when it crashed. I deleted all of them and YaCy started ok.

Thanks to all for your comments.