yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.39k stars 427 forks source link

Thread= BusyThread CrawlQueues.coreCrawlJob daemon id=82 BLOCKED maybe caused by PiHole's DNS constraints. Minor issue. #505

Closed smokingwheels closed 2 years ago

smokingwheels commented 2 years ago

System config. Java 11 Memory reserved for JVM 2000 peaking at 1600 Ubuntu 20.04 12 year old i5 notebook Using Googlebot

Pihole and Yacy running on same machine. https://community.searchlab.eu/t/i-have-yacy-and-pihole-running-on-the-same-device-now-may-cure-the-problem-of-slow-crawling-after-a-while/1205

Pihole's error msg when starting a crawl with over 400 sites Depth 2 https://smokingwheels.github.io/yacy/crawl/forums.txt

From the time I started a crawl. DNSMASQ_WARN | Warning in dnsmasq core:Maximum number of concurrent DNS queries reached (max: 150)

There was about 4500 DNS Requests at the start of my crawl. See https://twitter.com/smokingwheels/status/1575580212896878592

Crawl list https://smokingwheels.github.io/yacy/crawl/forums.txt

I have commented in the pihole forum. https://discourse.pi-hole.net/t/maximum-number-of-concurrent-dns-queries-reached-max-150-when-starting-a-crawl-with-yacy/58263

Many Thanks.

`YaCy Version: 1.925/9749 Assigned Memory = 2021130240 Used Memory = 1322228872 Available Memory = 698901368

this thread dump function can find threads that lock others, to enable this function start YaCy with 'startYACY.sh -l'

THREADS WITH STATES: BLOCKED

Thread= BusyThread CrawlQueues.coreCrawlJob daemon id=82 BLOCKED at net.yacy.crawler.robots.RobotsTxt.getEntry(RobotsTxt.java:164) [synchronized (syncObj) {] at net.yacy.crawler.robots.RobotsTxt.getEntry(RobotsTxt.java:126) at net.yacy.crawler.data.Latency.waitingRobots(Latency.java:126) at net.yacy.crawler.data.Latency.waitingRemaining(Latency.java:217) at net.yacy.crawler.data.Latency.getDomainSleepTime(Latency.java:286) at net.yacy.crawler.HostQueue.pop(HostQueue.java:485) at net.yacy.crawler.HostBalancer.pop(HostBalancer.java:484) at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:341) at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:291) at net.yacy.crawler.data.CrawlQueues.coreCrawlJob(CrawlQueues.java:331) at net.yacy.search.Switchboard$10.jobImpl(Switchboard.java:1182) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:64) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215)

Thread= qtp1564892747-89-acceptor-0@4f437563-httpd:8091@47da3952{HTTP/1.1, (http/1.1)}{0.0.0.0:8091} id=89 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:232) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750)

`

smokingwheels commented 2 years ago

Trying to execute same same crawl again without clearing Cache crawler wont start.

`YaCy Version: 1.925/9749 Assigned Memory = 2089811968 Used Memory = 611859152 Available Memory = 1477952816

this thread dump function can find threads that lock others, to enable this function start YaCy with 'startYACY.sh -l'

THREADS WITH STATES: BLOCKED

Thread= qtp1564892747-90-acceptor-1@1439ed12-httpd:8091@47da3952{HTTP/1.1, (http/1.1)}{0.0.0.0:8091} id=90 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:232) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750)

Thread= qtp1564892747-82046 id=82046 BLOCKED at net.yacy.crawler.HostQueue.clear(HostQueue.java:304) [for (final Map.Entry entry: this.depthStacks.entrySet()) {] at net.yacy.crawler.HostQueue.removeAllByHostHashes(HostQueue.java:366) at net.yacy.crawler.HostBalancer.removeAllByHostHashes(HostBalancer.java:202) at net.yacy.crawler.data.NoticedURL.removeByHostHash(NoticedURL.java:253) at net.yacy.crawler.data.CrawlQueues.removeHosts(CrawlQueues.java:217) at Crawler_p.respond(Crawler_p.java:420) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at net.yacy.http.servlets.YaCyDefaultServlet.invokeServlet(YaCyDefaultServlet.java:659) at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:869) at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:302) at net.yacy.http.servlets.YaCyDefaultServlet.doPost(YaCyDefaultServlet.java:364) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:716) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:567) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) at org.eclipse.jetty.server.HttpChannel$$Lambda$366/136285326.dispatch(Unknown Source) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750)

Thread= qtp1564892747-82640 id=82640 BLOCKED at net.yacy.crawler.HostQueue.removeAllByProfileHandle(HostQueue.java:333) [synchronized (this) {] at net.yacy.crawler.HostBalancer.removeAllByProfileHandle(HostBalancer.java:187) at net.yacy.crawler.data.NoticedURL.removeByProfileHandle(NoticedURL.java:244) at Crawler_p.respond(Crawler_p.java:217) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at net.yacy.http.servlets.YaCyDefaultServlet.invokeServlet(YaCyDefaultServlet.java:659) at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:869) at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:766) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:567) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) at org.eclipse.jetty.server.HttpChannel$$Lambda$366/136285326.dispatch(Unknown Source) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750) ` Thanks

smokingwheels commented 2 years ago

Trying to execute same same crawl again without clearing Cache crawler wont start.

Found Web Cache full. (8 GB). Cleared cache and set to 100 GB

Crawl started ok. DNS hits in pihole on starting a crawl. No concurrent warnings from the pihole but slightly more queries answered than before with the same site list. https://twitter.com/smokingwheels/status/1575597761189531648

Blocked threads as follows. `* Start Thread Dump Fri Sep 30 05:20:09 AWST 2022 ***

YaCy Version: 1.925/9749 Assigned Memory = 1864368128 Used Memory = 790363696 Available Memory = 1074004432

this thread dump function can find threads that lock others, to enable this function start YaCy with 'startYACY.sh -l'

THREADS WITH STATES: BLOCKED

Thread= qtp688593710-212-acceptor-1@7027b3c6-httpd:8091@4a9cc6cb{HTTP/1.1, (http/1.1)}{0.0.0.0:8091} id=212 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750)

Thread= BusyThread CrawlQueues.coreCrawlJob daemon id=204 BLOCKED at net.yacy.crawler.robots.RobotsTxt.getEntry(RobotsTxt.java:164) [synchronized (syncObj) {] at net.yacy.crawler.robots.RobotsTxt.getEntry(RobotsTxt.java:126) at net.yacy.crawler.data.Latency.waitingRobots(Latency.java:126) at net.yacy.crawler.data.Latency.waitingRemaining(Latency.java:217) at net.yacy.crawler.data.Latency.getDomainSleepTime(Latency.java:286) at net.yacy.crawler.HostQueue.pop(HostQueue.java:485) at net.yacy.crawler.HostBalancer.pop(HostBalancer.java:484) at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:341) at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:291) at net.yacy.crawler.data.CrawlQueues.coreCrawlJob(CrawlQueues.java:331) at net.yacy.search.Switchboard$10.jobImpl(Switchboard.java:1182) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:64) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215) ` Thanks.

smokingwheels commented 2 years ago

With Cleared Cache and crawling.

`YaCy Version: 1.925/9749 Assigned Memory = 2013265920 Used Memory = 1386412144 Available Memory = 626851664

this thread dump function can find threads that lock others, to enable this function start YaCy with 'startYACY.sh -l'

THREADS WITH STATES: BLOCKED

Thread= CrawlStacker_pool-1-thread-195 id=349 BLOCKED at java.io.FilterInputStream.skip(FilterInputStream.java:151) at net.yacy.kelondro.table.ChunkIterator.next0(ChunkIterator.java:90) at net.yacy.kelondro.table.ChunkIterator.next0(ChunkIterator.java:39) at net.yacy.cora.util.LookAheadIterator.next(LookAheadIterator.java:68) at net.yacy.kelondro.table.Table.(Table.java:168) at net.yacy.kelondro.index.OnDemandOpenFileIndex.getIndex(OnDemandOpenFileIndex.java:61) at net.yacy.kelondro.index.OnDemandOpenFileIndex.has(OnDemandOpenFileIndex.java:191) at net.yacy.kelondro.index.BufferedObjectIndex.has(BufferedObjectIndex.java:183) at net.yacy.crawler.HostQueue.has(HostQueue.java:399) at net.yacy.crawler.HostQueue.push(HostQueue.java:430) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:298) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:193) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:400) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:139) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:64) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Thread= CrawlStacker_pool-1-thread-196 id=351 BLOCKED at net.yacy.crawler.HostBalancer.push(HostBalancer.java:290) [synchronized (this) {] at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:193) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:400) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:139) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:64) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Thread= CrawlStacker_pool-1-thread-198 id=355 BLOCKED at net.yacy.kelondro.index.BufferedObjectIndex.has(BufferedObjectIndex.java:182) [synchronized (this.backend) {] at net.yacy.crawler.HostQueue.has(HostQueue.java:399) at net.yacy.crawler.HostBalancer.has(HostBalancer.java:247) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:287) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:193) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:400) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:139) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:64) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Thread= qtp688593710-211-acceptor-0@1d1d02ab-httpd:8091@4a9cc6cb{HTTP/1.1, (http/1.1)}{0.0.0.0:8091} id=211 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:232) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750) `

smokingwheels commented 2 years ago

No Errors or Warnings in 1000 line log.

`YaCy Version: 1.925/9749 Assigned Memory = 2064121856 Used Memory = 877593384 Available Memory = 1186528472

this thread dump function can find threads that lock others, to enable this function start YaCy with 'startYACY.sh -l'

THREADS WITH STATES: BLOCKED

Thread= BusyThread CrawlQueues.coreCrawlJob daemon id=204 BLOCKED at net.yacy.crawler.HostBalancer.pop(HostBalancer.java:319) [synchronized (this) {] at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:341) at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:291) at net.yacy.crawler.data.CrawlQueues.coreCrawlJob(CrawlQueues.java:331) at net.yacy.search.Switchboard$10.jobImpl(Switchboard.java:1182) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:64) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215)

Thread= CrawlStacker_pool-1-thread-198 id=355 BLOCKED at net.yacy.crawler.HostBalancer.push(HostBalancer.java:290) [synchronized (this) {] at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:193) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:400) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:139) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:64) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Thread= CrawlStacker_pool-1-thread-197 id=353 BLOCKED at net.yacy.kelondro.index.BufferedObjectIndex.has(BufferedObjectIndex.java:182) [synchronized (this.backend) {] at net.yacy.crawler.HostQueue.has(HostQueue.java:399) at net.yacy.crawler.HostBalancer.has(HostBalancer.java:247) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:287) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:193) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:400) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:139) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:64) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Thread= qtp688593710-14728 id=14728 BLOCKED at net.yacy.kelondro.index.BufferedObjectIndex.size(BufferedObjectIndex.java:155) [synchronized (this.backend) {] at net.yacy.crawler.HostQueue.size(HostQueue.java:411) at net.yacy.crawler.HostBalancer.size(HostBalancer.java:254) at net.yacy.crawler.data.NoticedURL.stackSize(NoticedURL.java:169) at net.yacy.crawler.data.CrawlQueues.coreCrawlJobSize(CrawlQueues.java:267) at net.yacy.search.Switchboard$10.getJobCount(Switchboard.java:1187) at status_p.respond(status_p.java:102) at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at net.yacy.http.servlets.YaCyDefaultServlet.invokeServlet(YaCyDefaultServlet.java:659) at net.yacy.http.servlets.YaCyDefaultServlet.handleTemplate(YaCyDefaultServlet.java:869) at net.yacy.http.servlets.YaCyDefaultServlet.doGet(YaCyDefaultServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:766) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:567) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at net.yacy.http.CrashProtectionHandler.handle(CrashProtectionHandler.java:33) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) at org.eclipse.jetty.server.HttpChannel$$Lambda$384/893933681.dispatch(Unknown Source) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750)

Thread= qtp688593710-211-acceptor-0@1d1d02ab-httpd:8091@4a9cc6cb{HTTP/1.1, (http/1.1)}{0.0.0.0:8091} id=211 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:232) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750) `

smokingwheels commented 2 years ago

Crawling Slow down. looks like the crawler is not ignoring images any more from log.

Paused crawler, From log Yacy is storing .pdf files.

I Pause the crawler and restarted yacy with GUI. Crawling resumed ok. https://twitter.com/smokingwheels/status/1575635279347523586

`YaCy Version: 1.925/9749 Assigned Memory = 2064646144 Used Memory = 1038185624 Available Memory = 1026460520

this thread dump function can find threads that lock others, to enable this function start YaCy with 'startYACY.sh -l'

THREADS WITH STATES: BLOCKED

Thread= BusyThread CrawlQueues.coreCrawlJob daemon id=204 BLOCKED at net.yacy.crawler.HostBalancer.pop(HostBalancer.java:319) [synchronized (this) {] at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:341) at net.yacy.crawler.data.NoticedURL.pop(NoticedURL.java:291) at net.yacy.crawler.data.CrawlQueues.coreCrawlJob(CrawlQueues.java:331) at net.yacy.search.Switchboard$10.jobImpl(Switchboard.java:1182) at net.yacy.kelondro.workflow.InstantBusyThread.job(InstantBusyThread.java:64) at net.yacy.kelondro.workflow.AbstractBusyThread.run(AbstractBusyThread.java:215)

Thread= CrawlStacker_pool-1-thread-198 id=355 BLOCKED at net.yacy.crawler.HostBalancer.push(HostBalancer.java:290) [synchronized (this) {] at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:193) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:400) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:139) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:64) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Thread= CrawlStacker_pool-1-thread-197 id=353 BLOCKED at net.yacy.kelondro.index.BufferedObjectIndex.has(BufferedObjectIndex.java:182) [synchronized (this.backend) {] at net.yacy.crawler.HostQueue.has(HostQueue.java:399) at net.yacy.crawler.HostBalancer.has(HostBalancer.java:247) at net.yacy.crawler.HostBalancer.push(HostBalancer.java:287) at net.yacy.crawler.data.NoticedURL.push(NoticedURL.java:193) at net.yacy.crawler.CrawlStacker.stackCrawl(CrawlStacker.java:400) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:139) at net.yacy.crawler.CrawlStacker.process(CrawlStacker.java:64) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)

Thread= qtp688593710-212-acceptor-1@7027b3c6-httpd:8091@4a9cc6cb{HTTP/1.1, (http/1.1)}{0.0.0.0:8091} id=212 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:232) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750) `

smokingwheels commented 2 years ago

Cache storing jpg files when not wanted.

`I 2022/09/30 08:36:11 SWITCHBOARD * Not Condensed Resource 'https://bilder.pcwelt.de/4358286_300x150.jpg': indexing of media files not wanted by crawl profile

I 2022/09/30 08:36:11 SWITCHBOARD Indexed 380 words in URL https://techcommunity.microsoft.com/t5/windows-365/bd-p/Windows365Discussions [npSGhjFgJ745] Description: Windows 365 - Microsoft Tech Community MimeType: text/html | Charset: UTF-8 | Size: 5944 bytes | LinkStorageTime: 6 ms | indexStorageTime: 2 ms

I 2022/09/30 08:36:11 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={}{add=[npSGhjFgJ745 (1745352834956656640)]} 0 4

I 2022/09/30 08:36:11 Fulltext * indexing: npSGhjFgJ745 https://techcommunity.microsoft.com/t5/windows-365/bd-p/Windows365Discussions

I 2022/09/30 08:36:11 HTCACHE * storing content of url https://bilder.pcwelt.de/4358286_300x150.jpg, 6065 bytes`

Orbiter commented 2 years ago

Most of those BLOCKED messages are harmless and also wanted in case they unblock by themself. I have added some workarounds to some of those messages to prevent some of those blockings. They should be normal and harmless otherwise.

smokingwheels commented 2 years ago

Most of those BLOCKED messages are harmless and also wanted in case they unblock by themself.

Ok Thanks for that update.

I think the PiHole has something to do with it just a guess really. I will close issue and do more testing.

`* Start Thread Dump Fri Sep 30 20:20:20 AWST 2022 ***

YaCy Version: 1.925/9749 Assigned Memory = 1864368128 Used Memory = 1281255560 Available Memory = 583093816

this thread dump function can find threads that lock others, to enable this function start YaCy with 'startYACY.sh -l'

THREADS WITH STATES: BLOCKED

Thread= qtp626193099-92-acceptor-0@9b145bb-httpd:8091@61078690{HTTP/1.1, (http/1.1)}{0.0.0.0:8091} id=92 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905) at java.lang.Thread.run(Thread.java:750) `