istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

Upgrading the ELK stack #253

Open 4OH4 opened 3 years ago

4OH4 commented 3 years ago

Great project, thanks for sharing - and supporting for so long!

I ran into a few problems running the ELK stack - the Elasticsearch container kept restarting with java.lang.IllegalStateException docker-elk-logs.txt

I couldn't find the root cause for this, but in the end switched to using a later version of the ELK stack - v7.10 - which gave good results, and used Filebeat rather than Logstash as there seemed to be more documentation around this use-case. Not sure if this is a change you wanted to make to the project, but have my files on a branch here - happy to submit a pull request if you think that it might be useful: https://github.com/4OH4/scrapy-cluster/tree/elk-update

Haven't managed to properly import the Kibana dashboard configuration from export.json though - I guess a few things have changed between the different versions of Kibana.

Cheers

madisonb commented 3 years ago

If you've got Filebeat and the latest ELK stack going (with the json logs parsed correctly into the index) I would 100% accept a PR - it's been on my todo list to move the project over to it.

As a bonus, I would also prefer we switch over all the logging to stdout and pull directly from the container logs as that's the better practice nowadays vs logstash.

4OH4 commented 3 years ago

Ok - great. I'll take another look at the parsing of the log files - at the moment I am using an index pattern of filebeat-* so its not being parsed in quite the same way as before, although the JSON key/value pairs are being stored correctly in Elasticsearch.