Closed ivandonofrio closed 5 years ago
Not sure but I think this depends on your Kafka setup. How are you running Kafka?
EDIT e.g. having just run
docker-compose up
in one terminal, I did this in another (using my installed version of Kafka):
cat testdata/seed.json | /usr/local/bin/kafka-console-producer --broker-list localhost:9092 --topic uris.tocrawl.fc
and it worked.
My version of Kafka is 1.1.0 and I installed it following this guide: https://www.digitalocean.com/community/tutorials/how-to-install-apache-kafka-on-ubuntu-18-04
Despite of this I needed to run these commands to make the producer work once composed docker:
docker cp /home/<user>/ukwa-heritrix/testdata/seed.json ukwa-heritrix_kafka_1:/tmp/
docker exec -it ukwa-heritrix_kafka_1 bash
cat /tmp/seed.json | kafka-console-producer.sh --broker-list 172.17.0.1:9092 --topic uris.tocrawl.fc
What Kafka version are you using? How do you installed it?
I'm using Kafka inside Docker:
This works because all the services are on the same internal Docker network. As you are running your own Kafka on the host, you'll need to the KAFKA_BOOTSTRAP_SERVERS
environment variable for Heritrix so it points to your Kafka.
In case it helps, I've found I have to use this form to connect to Kafka successfully from separate containers when running on the same host:
docker run --network="host" ukwa/ukwa-manage submit -k 192.168.X.X:9094 -L now fc.tocrawl.bypm http://acid.matkelly.com/
HTH. I'll close this but feel free to re-open if necessary.
Kafka returns this error while sending messages to topic to run a crawl test as described in the documentation:
cat testdata/seed.json | $KAFKA/kafka-console-producer.sh --broker-list localhost:9092 --topic uris.tocrawl.fc
[2019-03-21 10:08:26,427] ERROR Error when sending message to topic uris.tocrawl.fc with key: null, value: 361 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback) org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for uris.tocrawl.fc-10: 1542 ms has passed since batch creation plus linger time
It would be possible to have clarifications about how to set up a new crawl and from which docker container or environment launch these commands?
Thanks for your explanations.