yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.4k stars 426 forks source link

Crawling a site at depths of 5 6 7 10 and Invalid DNS requests when crawler slows down. #578

Closed smokingwheels closed 1 year ago

smokingwheels commented 1 year ago

I hired a 4 CPU cloud to gather main content of my Yacy using an Android phone. The Cloud had no swap space defined and had to keep rebooting cloud every 3 - 4 days the whole time it was there. I Transfred 150 GB's to a Windows PC while I reconnected and set my Linux systems back up. Yacy's performance. It ran in windows ok but a bit slow but its only on a 5400 rpm notebook system.

I SSH transfered to Linux on New SSD for testing and I could not get yacy to start. I changed all the locations in the yacy.init still would not start after running in windows. I ended up in the end taking a yacy.init from another linux yacy install and overwriting the copyied windows one and it started. I checked the index and it was ok, took some time.
I did an Export from 1.924 and an import to the latest github version and have been testing that on a seperated SSD in my old I7 PC. Note: DHT WORDS was around 6 million and when I had imported it when back to 1.5 million.

The startYACY.sh contents. After expermenting and reading up on minecraft tweeks on the JAVA Version.

Added java settings seems to help crawling on 10 year old $5 Mainboard with Ubuntu 18.04.

JAVA_ARGS="-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSIncrementalPacing -XX:ParallelGCThreads=15 -XX:+AggressiveOpts $JAVA_ARGS";

Java on a new Ubuntu 22.04 install yacy would not start until these where left.

JAVA_ARGS="-XX:ParallelGCThreads=15 -XX:+AggressiveOpts $JAVA_ARGS";

crawling rate set to 30000 NBN 25 Connection 25 mbs connection. Pretty much saturated the connection while testing.

-XX:ParallelGCThreads=15 I have tried 4 8 10 while crawling now testing 10 after crawler stopped

My startYACY.sh

!/usr/bin/env sh

JAVA="which java" CONFIGFILE="DATA/SETTINGS/yacy.conf" LOGFILE="yacy.log" PIDFILE="yacy.pid" OS="uname"

get javastart args

JAVA_ARGS="-server -Djava.awt.headless=true -Dfile.encoding=UTF-8"; JAVA_ARGS="-XX:ParallelGCThreads=15 -XX:+AggressiveOpts $JAVA_ARGS";

JAVA_ARGS="-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails $JAVA_ARGS";

DNS requests from my Yacy Search Engine.

From Pihole log a DNS typical lookup.

AAAA    iframe.dacast.com       OK (answered by dns.google#53)  CNAME (8.5ms)   
AAAA    iframe.dacast.com       OK (sent to resolver1.opendns.com#53)   N/A 
A   iframe.dacast.com       OK (answered by resolver1.opendns.com#53)   CNAME (4.2ms)   
A   iframe.dacast.com       OK (sent to resolver1.opendns.com#53)   N/A 
AAAA    www.speldvic.org.au     OK (answered by resolver1.opendns.com#53)   CNAME (5.4ms)   
AAAA    www.speldvic.org.au     OK (sent to resolver1.opendns.com#53)   N/A 
A   www.speldvic.org.au     OK (answered by resolver1.opendns.com#53)   CNAME (6.2ms)   
A   www.speldvic.org.au     OK (sent to resolver1.opendns.com#53)   N/A

Added more primary and secondary known DNS servers in Piholes settings.

Crawling a site at depths of 5 6 7.

Have come up with invalid DNS requests when crawler slows down.

Types of domains DNS requests from Yacy to the pihole log and my some in blocklist when crawler slows down. When the initial look up is done there is 30000 DNS quieris in 10 min in the first 10 minutes of crawling.

Various types of invalid DNS requests when crawler slows down the Pihole shows them.

AAAA    dpird.wa.gov.au?subject=    OK (cache)  NXDOMAIN (0.4ms)    
AAAA    dpird.wa.gov.au?subject=    OK (cache)  NXDOMAIN (1.2ms)    
A   dpird.wa.gov.au?subject=    OK (answered by dns.quad9.net#53)   NXDOMAIN (1.2ms)    
A   dpird.wa.gov.au?subject=    OK (sent to dns.quad9.net#53)   N/A

customer_name.atlassian.net"    
charts.gitlab.io<   
www.mumble.info">mumble<    
ayearofsprings.crd.co">a    
maven.google.com"   
toasters.example",  
www.apple.com"  
apptwo.company.com" 
www.yosoygames.com.ar%5c    
www.example-dsp.com',   
www.example-ssp.com',   
buyer2.com',    
www.example-dsp.com'    
www.another-buyer.com'  
www.some-other-ssp.com',    
risc.example.com",
A   gayle bowen (klgates.com)

Crawling speed set at 30000 Had system loan indicator % up at 1200... 8 is normal for my cpu. hit the reset. Not sure where the fault is happened several times.

Going to check index now. john@john-desktop:/media/john/YaCy10001/yacy/bin$ ./checkindex.sh

Checking Solr index at /media/john/YaCy10001/yacy/DATA/INDEX/freeworld...

NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene...', so assertions are enabled

Opening index @ /media/john/YaCy10001/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_6_6/ccd ..ollection1/data/index/

**** Changed to /media/john/YaCy10001/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1/collection1/data/index/

Index check deleted a deleted a few docs.

Error. Exception in thread "main" org.apache.lucene.index.IndexNotFoundException: no segments* file found in MMapDirectory@/media/john/YaCy10001/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_6_6/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@41a4555e: files: [write.lock, write.lock.old] at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:517) at org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2962) at org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2860) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2786) john@john-desktop:/media/john/YaCy10001/yacy/bin$ ./checkindex.sh

When crawling maximum speed for some time my 6 GB Swap file filled up..now has 22 GB swap space. There was some debug msg you system does not have enough memory to debug.

When crawling maximum speed crawler ques keeps files open like trying to download the zip and other file contents directly.

Also had socket error once or twice when pushing Yacy. Changed to.

Thread Pool Settings: Crawler Pool 2000 4 Robots.txt Pool 2000 4 httpd Session Pool 200 9

Upseting the DNS Server causes Crawling to stop.

Did upgrade XML (Rich and full-text Solr data, one document per line in one large xml file, can be processed with shell tools, can be imported with DATA/SURROGATE/in/) Lost DHT WORDS.

System YaCy version: yacy_v1.926_202304041204_1c0f50985 Uptime: 0 days 00:34 Java version: 1.8.0_362 Processors: 8 Load: 0.98 Threads: 52/14, peak:107, total:430

Thread Dump just after starting yacy with crawl depth 7.

THREADS WITH STATES: BLOCKED

Thread= qtp261650860-104-acceptor-1@1c987884-httpd:8095@7bf9b098{HTTP/1.1, (http/1.1)}{0.0.0.0:8095} id=104 BLOCKED at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:233) at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:388) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:702) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036) at java.lang.Thread.run(Thread.java:750)

Warnings. When crawling.

W 2023/06/02 19:15:27 ConcurrentLog * java.io.IOException: this IndexWriter is closed java.io.IOException: this IndexWriter is closed at net.yacy.search.index.Fulltext.putDocument(Fulltext.java:377) at net.yacy.search.index.Segment.putDocument(Segment.java:583) at net.yacy.search.index.Segment.storeDocument(Segment.java:666) at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3388) at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3302) at net.yacy.search.Switchboard.lambda$new$0(Switchboard.java:1049) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.solr.common.SolrException: this IndexWriter is closed at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:236) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:792) at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:754) at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:769) at net.yacy.cora.federate.solr.connector.SolrServerConnector.add(SolrServerConnector.java:221) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.add(MirrorSolrConnector.java:204) at net.yacy.search.index.Fulltext.putDocument(Fulltext.java:375) ... 12 more Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:877) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:891) at org.apache.lucene.index.IndexWriter.deleteDocuments(IndexWriter.java:1701) at org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:433) at org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:81) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:59) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:269) at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionDelete(DistributedUpdateProcessor.java:1069) at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionDelete$2(DistributedUpdateProcessor.java:981) at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:981) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteById(DistributedUpdateProcessor.java:757) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:743) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:129) at org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:203) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:127) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216) ... 21 more Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=cee011a6 actual=b5b68e83 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/media/john/YaCy10001/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1/collection1/data/index/_6q7.cfs") [slice=_6q7.fdt])) at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419) at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:547) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.checkIntegrity(CompressingStoredFieldsReader.java:786) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:610) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363) at org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:201) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

E 2023/06/02 19:15:27 org.apache.solr.handler.RequestHandlerBase org.apache.solr.common.SolrException: this IndexWriter is closed at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:236) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:792) at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:754) at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:769) at net.yacy.cora.federate.solr.connector.SolrServerConnector.add(SolrServerConnector.java:221) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.add(MirrorSolrConnector.java:204) at net.yacy.search.index.Fulltext.putDocument(Fulltext.java:375) at net.yacy.search.index.Segment.putDocument(Segment.java:583) at net.yacy.search.index.Segment.storeDocument(Segment.java:666) at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3388) at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3302) at net.yacy.search.Switchboard.lambda$new$0(Switchboard.java:1049) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:877) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:891) at org.apache.lucene.index.IndexWriter.deleteDocuments(IndexWriter.java:1701) at org.apache.solr.update.DirectUpdateHandler2.delete(DirectUpdateHandler2.java:433) at org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:81) at org.apache.solr.update.processor.UpdateRequestProcessor.processDelete(UpdateRequestProcessor.java:59) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalDelete(DistributedUpdateProcessor.java:269) at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionDelete(DistributedUpdateProcessor.java:1069) at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionDelete$2(DistributedUpdateProcessor.java:981) at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionDelete(DistributedUpdateProcessor.java:981) at org.apache.solr.update.processor.DistributedUpdateProcessor.doDeleteById(DistributedUpdateProcessor.java:757) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:743) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:129) at org.apache.solr.handler.loader.JavabinLoader.delete(JavabinLoader.java:203) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:127) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216) ... 21 more Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=cee011a6 actual=b5b68e83 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/media/john/YaCy10001/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1/collection1/data/index/_6q7.cfs") [slice=_6q7.fdt])) at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419) at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:547) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.checkIntegrity(CompressingStoredFieldsReader.java:786) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:610) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363) at org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:201) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

I 2023/06/02 19:15:27 org.apache.solr.update.processor.LogUpdateProcessorFactory [collection1] webapp=null path=/update params={}{} 0 0

Possible web cache issue. Seems to go away when not storing to web cache.

W 2023/06/02 19:15:27 ConcurrentLog * org.apache.solr.common.SolrException: Server error writing document id 0Ik_iR28_Ta5 to the index org.apache.solr.common.SolrException: Server error writing document id 0Ik_iR28_Ta5 to the index at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:246) at org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:73) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:263) at org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:502) at org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:343) at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:343) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:229) at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:106) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:343) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:291) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:244) at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:82) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2637) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:227) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:214) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:177) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138) at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:156) at net.yacy.cora.federate.solr.connector.SolrServerConnector.add(SolrServerConnector.java:213) at net.yacy.cora.federate.solr.connector.MirrorSolrConnector.add(MirrorSolrConnector.java:204) at net.yacy.search.index.Fulltext.putDocument(Fulltext.java:375) at net.yacy.search.index.Segment.putDocument(Segment.java:583) at net.yacy.search.index.Segment.storeDocument(Segment.java:666) at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3388) at net.yacy.search.Switchboard.storeDocumentIndex(Switchboard.java:3302) at net.yacy.search.Switchboard.lambda$new$0(Switchboard.java:1049) at net.yacy.kelondro.workflow.InstantBlockingThread.job(InstantBlockingThread.java:72) at net.yacy.kelondro.workflow.AbstractBlockingThread.run(AbstractBlockingThread.java:82) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:877) at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:891) at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1468) at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1464) at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:967) at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:342) at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:294) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) ... 45 more Caused by: org.apache.lucene.index.CorruptIndexException: checksum failed (hardware problem?) : expected=cee011a6 actual=b5b68e83 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/media/john/YaCy10001/yacy/DATA/INDEX/freeworld/SEGMENTS/solr_8_8_1/collection1/data/index/_6q7.cfs") [slice=_6q7.fdt])) at org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:419) at org.apache.lucene.codecs.CodecUtil.checksumEntireFile(CodecUtil.java:547) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.checkIntegrity(CompressingStoredFieldsReader.java:786) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.merge(CompressingStoredFieldsWriter.java:610) at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:228) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:105) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4759) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4363) at org.apache.solr.update.SolrIndexWriter.merge(SolrIndexWriter.java:201) at org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:5922) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:626) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:684)

E 2023/06/02 19:15:27 org.apache.solr.handler.RequestHandlerBase org.apache.solr.common.SolrException: Server error writing document id 0Ik_iR28_Ta5 to the index at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:246) at org.apache.solr.update.processor.RunUpdateProcessorFactory$RunUpdateProcessor.proce

Also the log files when spammed are all updated many times a second.

Started yacy version 1.924/9000

Crawler set to 400 ppm and left running. Crawler stopped and yacy was not responding. Happened twice over a few days.

Looked in and deleted... /DATA/INDEX/freeworld/QUEUES/CrawlerCoreStacks %20www.ischs.org.au-#XxMjNQ.80

Will continue testing index just hit 8 Million from 6.5 million virtual cloud system.

Many Thanks for reading.

smokingwheels commented 1 year ago

somthing is not working.. Prob my blocklist.. Attach files by dragging & dropping, selecting or pasting them. Uploading your files… We don’t support that file type. with a GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CSV, DOCX, FODG, FODP, FODS, FODT, GZ, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP. Attaching documents requires write permission to this repository. with a GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CSV, DOCX, FODG, FODP, FODS, FODT, GZ, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP. We don’t support that file type. with a GIF, JPEG, JPG, MOV, MP4, PNG, SVG, WEBM, CSV, DOCX, FODG, FODP, FODS, FODT, GZ, LOG, MD, ODF, ODG, ODP, ODS, ODT, PATCH, PDF, PPTX, TGZ, TXT, XLS, XLSX or ZIP. This file is empty. with a file that’s not empty. This file is hidden. with another file. Something went really wrong, and we can’t process that file.