yasserg / crawler4j

Open Source Web Crawler for Java
Apache License 2.0
4.51k stars 1.92k forks source link

Does crawler4j support to feed in the crawled result into Kafka topic #379

Open johnklee opened 5 years ago

johnklee commented 5 years ago

As title. If I have a Kafka producer/consumer framework ready. Will crawler4j support us to configure Kafka setting so it can feed in the crawled result into exist Kafka topic?

s17t commented 5 years ago

C4J persists one type of data: the crawled URLs, currently in the embedded sleepycat DB. User's extension of WebCrawler can persist downloaded data and the visited URLs in every type of storage (or queue systems), it the if the implementations make it so.