rackspace-cookbooks / elkstack

Elasticsearch, logstash, and kibana stack
Other
81 stars 54 forks source link

Recipe: elkstack::logstash #178

Open jnganga opened 8 years ago

jnganga commented 8 years ago

Hi, I'm running into this error below when converging. Without fail, it converges on the second attempt. Is it that elasticsearch is not running yet and logstash depends on it, hence the failure? How do we resolve this?

       Recipe: elkstack::elasticsearch
         * service[elasticsearch] action start (up to date)
       Recipe: elkstack::logstash
         * logstash_service[server] action restart
           * runit_service[logstash_server] action restart

             ================================================================================
             Error executing action `restart` on resource 'runit_service[logstash_server]'
             ================================================================================

             Mixlib::ShellOut::ShellCommandFailed
             ------------------------------------
             Expected process to exit with [0], but received '1'
             ---- Begin output of /usr/bin/sv restart /etc/service/logstash_server ----
             STDOUT: timeout: run: /etc/service/logstash_server: (pid 20646) 797s, got TERM
             STDERR: 
             ---- End output of /usr/bin/sv restart /etc/service/logstash_server ----
             Ran /usr/bin/sv restart /etc/service/logstash_server returned 1
TheSeubert commented 8 years ago

It is a bit hard to say from that output without some manual troubleshooting as well. From the output there, the elasticsearch service was started before, and a restart was issued to logstash_server runit service. The service started, received a PID, but then terminated itself.

After this failed run were you able to login to the instance and see the status of the elasticsearch service? It would also help to reference the runit logs under /var/log/logstash and see what the process got there. If you can hunt down this any any other information that may help us.

jnganga commented 8 years ago

Thanks for responding.

I see elasticsearch running after the failed run. But I can't tell if it was running right at the moment of failure.

$ curl 'http://localhost:9200/?pretty'
{
  "status" : 200,
  "name" : "default-ubuntu-1404",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "1.4.4",
    "build_hash" : "c88f77ffc81301dfa9dfd81ca2232f09588bd512",
    "build_timestamp" : "2015-02-19T13:05:36Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.3"
  },
  "tagline" : "You Know, for Search"
}

Also, in my log directory, this is what I have:

vagrant@default-ubuntu-1404:/var/log/logstash_server$ ll
total 8
drwxr-xr-x  2 root root   4096 Aug  9 21:29 ./
drwxrwxr-x 10 root syslog 4096 Aug  9 21:38 ../
lrwxrwxrwx  1 root root     34 Aug  9 21:29 config -> /etc/sv/logstash_server/log/config
martinb3 commented 8 years ago

Can you show us what settings you're applying or give us a reproducable example? I don't think we have enough information; I suspect this is an issue with configuration.

jnganga commented 8 years ago

I'm actually cloning the entire repo into a new folder: https://github.com/rackspace-cookbooks/elkstack.git and then running 'kitchen converge' without any additional modifications.

Please see the logs below for the first and second runs. FYI, a colleague got the same issue on his machine.

first_run_log_elkstack.txt second_run_log_elkstack.txt

martinb3 commented 8 years ago

Hi @jnganga -- I just cloned elkstack and ran the same command, & it converged for me on the first and second attempts.

Please see the logs below for the first and second runs. FYI, a colleague got the same issue on his machine.

These logs contain different run lists. cic_elkstack::packer is not part of elkstack, so I think there's something else going on here (these logs aren't from the same runlist, it seems).

Could you share your Berksfile.lock and Gemfile.lock so I can get on the same versions you're using, and re-test?

jnganga commented 8 years ago

Sorry, I attached the log files from my earlier run where I'm wrapping your cookbook. In both cases, with or without the wrapper, it errors out at the same place. Please see the files requested below. I only had to generate this on the first run with the wrapper cookbook. I probable should have repeated with the cloned elkstack. Will do that tonight.

Gemfile.lock.txt Berksfile.lock.txt

martinb3 commented 8 years ago

Yes, please let us know when you have something with elkstack itself so we can try to reproduce it. I'm specifically interested in both logs & lock files for elkstack specifically, not your wrapper. Thanks.

jnganga commented 8 years ago

Sure. Please find the logs and .lock files below.
My steps:

$ git clone https://github.com/rackspace-cookbooks/elkstack.git
$ cd elkstack/
$ berks install
$ bundle install
$ kitchen list
$ kitchen create default-ubuntu-1404
$ kitchen converge default-ubuntu-1404 - see attached log - "first_run_log_cloned_elkstack"
$ kitchen converge default-ubuntu-1404 - see attached log - "second_run_log_cloned_elkstack"

Thank you.

second_run_log_cloned_elkstack.txt first_run_log_cloned_elkstack.txt Cloned_elkstack_Berksfile.lock.txt Cloned_elkstack_Gemfile.lock.txt

TheSeubert commented 8 years ago

For what its worth, even though I realize may not add any value, but I can go through the same steps and I compared the Gemfile.lock and Berksfile.lock. I was able to converge with no errors, and the only difference I found was that I had a slightly newer ohai gem locally.

In your above command I see you did not do bundle exec so it was actually using your system/user gemset. Also to note though, you are running the latest kitchen, same as I am too, so I didn't actually see an issue there.

We'll continue to dig into this, but so far I don't see a smoking gun.

jnganga commented 8 years ago

@dude051 what exact command/order should I run for bundle exec

TheSeubert commented 8 years ago

The same order works, just prepend your kitchen commands with bundle exec so as to run it from the bundled gems. So to answer your question directly:

$ git clone https://github.com/rackspace-cookbooks/elkstack.git
$ cd elkstack/
$ bundle install
$ bundle exec berks install
$ bundle exec kitchen list
$ bundle exec kitchen create default-ubuntu-1404
$ bundle exec kitchen converge default-ubuntu-1404
$ bundle exec kitchen converge default-ubuntu-1404
jnganga commented 8 years ago

I get the same results sir! Please see attached logs.

second_run_log_bundle_elkstack.txt first_run_log_bundle_elkstack.txt

martinb3 commented 8 years ago

@jnganga Do you get anything in the logstash logs when you run this? I'm wondering if the logstash service just isn't starting for some reason.

imewish commented 8 years ago

i guess this is related to my issue. here, i faced the same. with latest logstash versions

https://github.com/lusis/chef-logstash/issues/459