peasead / elastic-container

Stand up a simple Elastic container with Kibana, Fleet, and the Detection Engine
Apache License 2.0
361 stars 62 forks source link

[BUG] #43

Open fish-not-phish opened 4 months ago

fish-not-phish commented 4 months ago

Hello,

I am having some problems deploying this stack as it appears the script is not running as expected. I am running on Ubuntu 22.04 LTS, fresh install. The virtual machine has 8 CPU cores, 16GB of RAM, and 500GB of storage, so I don't suspect a resource issue.

I edited the .env file, changing these 4 items:

No other modifications were made.

When I run the script sudo ./elastic-container.sh start, it runs and appears to set up the necessary containers. However, the output does not match what would be expected. The output I get is this:

Attempting to enable the Detection Engine and Prebuilt-Detection Rules

Kibana is up. Proceeding

Detection engine enabled. Installing prepackaged rules.

Prepackaged rules installed!

Enabling detection rules

Waiting 40 seconds for Fleet Server setup

Populating Fleet Settings

However, I never get this included in the output. It's simply missing:

READY SET GO!

Browse to https://localhost:5601

Username: elastic

Passphrase: you-changed-me-from-the-default-right?**

When I try to go to the URL, it isn't up, and is only accessible if I run the restart option for the script: sudo ./elastic-container.sh restart. Then the URL becomes accessible, but the fleet settings are not configured.

When I run sudo ./elastic-container.sh status, this is the output:

NAME                       IMAGE                               COMMAND             SERVICE         CREATED         STATUS              PORTS       
ecp-elasticsearch       docker.elastic.co/elasticsearch/elasticsearch:8.12.2        "/bin/tini -- /usr/l..."    elasticsearch       27 minutes ago      Up 24 minutes (healthy)     0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp

ecp-fleet-server        docker.elastic.co/beats/elastic-agent:8.12.2            "/usr/bin/tini -- /u..."    fleet-server        27 minutes ago      Up 24 minutes           0.0.0.0:8220->8220/tcp, :::8220->8220/tcp

ecp-kibana          docker.elastic.co/kibana/kibana:8.12.2              "/bin/tint -- /usr/l..."    kibana          27 minutes ago      Up 24 minutes (healthy)     0.0.0.0:5601->5601/tcp, :::5601->5601/tcp

I noticed that the ecp-fleet-server is online but is not denoted as "healthy". I checked the docker logs and observed connection refused errors:

"message": "Error dialing dial tcp [::1]:9200: connect: connection refused"

and also

"message": "Attempting to reconnec to backoff(elasticsearch(http://localhost:9200)) with 75 reconnect attempt(s)"

Reading this, I do understand that there may be a connectivity issue, however I am not running and UFW and I have not altered IP Tables. The Proxmox firewall for this Virtual Machine is off - so that shouldn't have any impact either.

Any help would be greatly appreciated to get this working.

fish-not-phish commented 4 months ago

I was looking through other issues and I noticed Issue 23 which talks about the Fleet server not coming back after reboot. I noticed the user mentioned version 8.5 still works. I went ahead and rolled back to that version within the repository history and that version does indeed work without any errors. A temporary fix, but obviously it would be preferred to run the most recent version.

peasead commented 4 months ago

Do you have this issue with 8.13?

If you want to test, you can to a ./elastic-container.sh destroy and then start fresh.

I'll also look into this.

fish-not-phish commented 4 months ago

Do you have this issue with 8.13?

If you want to test, you can to a ./elastic-container.sh destroy and then start fresh.

I'll also look into this.

I was using STACK_VERSION=8.12.2, I have not tried 8.13. I might go ahead and try to see if that works. I will let you know if 8.13 works or not.

Sadly, I have already destroyed and started fresh and I get the same result each time.

fish-not-phish commented 4 months ago

Doesn't appear to work with 8.13.

I tried changing STACK_VERSION=8.13.0 within .env.

TrainLam commented 4 months ago

Tried 8.4.3 and it is worked.However, the portal responses very slowly.

Tried 8.13.4, 8.12.2 and 8.12.0 and all are failure.

peasead commented 3 months ago

Thanks for your patience. I will open an Issue upstream.

When I deployed, the Fleet server wasn't healthy ever. I ran sh elastic-container.sh restart and then everything was healthy and Fleet was available. That's not a good solution, but it can work as a temporary solution. I wonder if there is some race condition where if one of the other containers isn't up and healthy, Fleet chokes and doesn't self-heal.

I'll try a few "relies on" options.

peasead commented 3 months ago

Even when it is healthy in Kibana, it never shows healthy in Docker.

image
./elastic-container.sh status
NAME                IMAGE                                                  COMMAND                  SERVICE             CREATED             STATUS                   PORTS
ecp-elasticsearch   docker.elastic.co/elasticsearch/elasticsearch:8.14.1   "/bin/tini -- /usr/l…"   elasticsearch       5 minutes ago       Up 5 minutes (healthy)   0.0.0.0:9200->9200/tcp, 9300/tcp
ecp-fleet-server    docker.elastic.co/beats/elastic-agent:8.14.1           "/usr/bin/tini -- /u…"   fleet-server        5 minutes ago       Up 44 seconds            0.0.0.0:8220->8220/tcp
ecp-kibana          docker.elastic.co/kibana/kibana:8.14.1                 "/bin/tini -- /usr/l…"   kibana              5 minutes ago       Up 4 minutes (healthy)   0.0.0.0:5601->5601/tcp

But it wasn't healthy in Kibana until I did a restart. I tried just restarting the Fleet container and the whole stack. Both brought Fleet online.

I'll follow up here with the Elastic Issue for tracking.

TrainLam commented 3 months ago

Thanks, and it means that 8.14.1 can fix the above issue by mannual restart kibana or all dockers. Am i correct?

Your comment is appreciated.

peasead commented 3 months ago

I believe I tried it both ways and both worked.

TrainLam commented 2 months ago

I tried but fleet server cannot be displayed such as screen1. However, screen2 can show that the fleet server is running.

Screen 1 image

Screen 2 image

TrainLam commented 2 months ago

Tried to do once 8.14.1 but situation is same

TrainLam commented 2 months ago

Tried to test previous version and just version 8.8.2 can execute the elastic-container.sh to build all sucessfully.

kaliankhe commented 1 month ago

any update on this still facing this issue with 8.14.3 as well

TrainLam commented 1 month ago

Tried with 8.15.0 and it is still not working such as following.

image

saidhfm commented 1 month ago

Finally after 2 hrs of troubleshooting found a workaround with 8.14.0 use this commit code - https://github.com/peasead/elastic-container/tree/0ef92f1e7bce33ca5c42bbe545630fe18c5bf028 copy code from each file and replace in your local files, recheck the .env file that should have STACK_VERSION=8.14.0 try this it will work 100% if you have more doubts on deployment reach out to me on linkedin i can help you - https://www.linkedin.com/in/saibatchu/

fish-not-phish commented 1 month ago

Finally after 2 hrs of troubleshooting found a workaround with 8.14.0 use this commit code - https://github.com/peasead/elastic-container/tree/0ef92f1e7bce33ca5c42bbe545630fe18c5bf028 copy code from each file and replace in your local files, recheck the .env file that should have STACK_VERSION=8.14.0 try this it will work 100% if you have more doubts on deployment reach out to me on linkedin i can help you - https://www.linkedin.com/in/saibatchu/

This actually seemed to work for me as well. I will update in 1-2 weeks if there are any health concerns regarding the fleet. I have a VM with a large amount of resources allocated to it, so there should not be any resource-related issues.

DefSecSentinel commented 4 weeks ago

Hey @fish-not-phish I'm jumping in to get this issue fixed. I just pushed a change to main in the shell script that fixes an issue with Fleet settings being properly populated. I just tested on macOS standing up a fresh stack and everything works as advertised. Can you pull main again and try standing up a stack then letting me know if you still experience a problem?

DefSecSentinel commented 4 weeks ago

You also should not have to change the LOCAL_KBN_URL value

TrainLam commented 3 weeks ago

I tried to test on Ubuntu and no luck such as following

image

image

DefSecSentinel commented 3 weeks ago

I'll do a test on Ubuntu today and see if I can't figure out what's going on.

octaviotron commented 1 week ago

Recently I use this repo for deploying ELK for testing and studing. It works like a charm in my computer so I decided to put it to work in a server and Fleet did not work.

After hours debugging I realize it has deploy problems in current Debian Stable (bookworm) but it works perfect in current Debian Testing (trixie).

Maybe one of the above cases may solve just using another Docker Host OS. Hope this will be useful to anyone.

I have no skills enough in ELK to realize where the problem resides on this change.