ukwa / ukwa-heritrix

The UKWA Heritrix3 custom modules and Docker builder.
9 stars 7 forks source link

Problem unpausing after taking Kafka off line #47

Open anjackson opened 5 years ago

anjackson commented 5 years ago

We paused the crawler to re-configure Kakfa, which required fully shutting down and restarting Kafka.

n.b. doing a stack rm complained about removing the Kafka network - we should have just stopped the Kafkas.

Once Kafka was ready, we restarted the crawlers, but they got stuck. There was some of this:

INFO: uk.bl.wap.crawler.postprocessor.KafkaKeyedCrawlLogFeed$StatsCallback onCompletion error count so far: 5629/698790000 (0.0%) [Thu Jul 11 15:38:47 GMT 2019]

And a lot of threads like this:

"ToeThread #999: http://smartroof.co.uk/2017/05/" #1087 prio=4 os_prio=0 tid=0x00007f8e5c9a2000 nid=0x49c in Object.wait() [0x00007f8b1d1d1000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at org.apache.kafka.clients.Metadata.awaitUpdate(Metadata.java:177)
        - locked <0x0000000556ad3178> (a org.apache.kafka.clients.Metadata)
        at org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata(KafkaProducer.java:884)
        at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:770)
        at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:760)
        at uk.bl.wap.crawler.postprocessor.KafkaKeyedToCrawlFeed.sendToKafka(KafkaKeyedToCrawlFeed.java:176)
        at uk.bl.wap.crawler.postprocessor.KafkaKeyedToCrawlFeed.innerProcess(KafkaKeyedToCrawlFeed.java:304)
        at org.archive.modules.Processor.innerProcessResult(Processor.java:175)
        at org.archive.modules.Processor.process(Processor.java:142)
        at org.archive.modules.ProcessorChain.process(ProcessorChain.java:131)
        at org.archive.crawler.postprocessor.CandidatesProcessor.runCandidateChain(CandidatesProcessor.java:176)
        at org.archive.crawler.postprocessor.CandidatesProcessor.innerProcess(CandidatesProcessor.java:230)
        at org.archive.modules.Processor.innerProcessResult(Processor.java:175)
        at org.archive.modules.Processor.process(Processor.java:142)
        at org.archive.modules.ProcessorChain.process(ProcessorChain.java:131)
        at org.archive.crawler.framework.ToeThread.run(ToeThread.java:152)

   Locked ownable synchronizers:
        - None

Going to attempt a full shut-down and restart. But really this should work.

The ToeThreads are not removed when pausing the crawler, so presumable something went wrong there when the Kafka went away?