splunk / docker-splunk

Splunk Docker GitHub Repository
459 stars 253 forks source link

Upgrade 7.2.6 -> 7.3.0 fails #209

Open mkinsley opened 5 years ago

mkinsley commented 5 years ago

Expect I should be able to upgrade from 7.2.6 to 7.3.0 tags

Observed Upgrades fail with permission denied when performing stat on {{ splunk_home}}/etc/auth/splunk.secret

Repro Steps

  1. docker-compose up -d
  2. docker-compose logs splunkenterprise --follow
  3. Wait for splunkweb to be available
  4. change image tag in compose to ":7.3.0" or ":edge"
  5. add env var SPLUNK_UPGRADE="true"
  6. docker-compose up -d
  7. docker-compose logs splunkenterprise

Docker Compose File

# docker run splunk/enterprise:7.0.3
# Options on how to review the EULA and accept it: 
# 1. docker run -it splunk/enterprisetrial:7.0.3
# 2. Add the following environment variable: SPLUNK_START_ARGS=--accept-license
# e.g., docker run -e "SPLUNK_START_ARGS=--accept-license" splunk/enterprisetrial 

# Support for Docker Compose v3, https://docs.docker.com/compose/overview/
version: '3'
volumes:
  opt-splunk-etc:
  opt-splunk-var:
services:
  splunkenterprise:
    #build: .
    hostname: splunkenterprise
    image: splunk/splunk:7.2.6
    image: splunk/splunk:7.3.0
    #image: splunk/splunk:edge
    environment:
      SPLUNK_START_ARGS: --accept-license --answer-yes
      DEBUG: "true"
      ANSIBLE_EXTRA_FLAGS: "-vvvv"
      SPLUNK_PASSWORD: 'icanhazpasswd'
      # SPLUNK_UPGRADE: "true"
      SPLUNK_ENABLE_LISTEN: 9997
      SPLUNK_ADD: tcp 1514
    volumes:
      - opt-splunk-etc:/opt/splunk/etc
      - opt-splunk-var:/opt/splunk/var
    ports:
      - "8000:8000"
      - "9997:9997"
      - "8088:8088"
      - "1514:1514"
nwang92 commented 5 years ago

This should be fixed in the latest splunk/splunk:edge images after https://github.com/splunk/splunk-ansible/pull/211 was merged. Closing this, but feel free to reopen.

johan1252 commented 5 years ago

I am still seeing this issue when upgrading from 7.2.4 to splunk:7.3, splunk:7.3.1 or splunk:edge.

[root@ip-10-129-2-126 ec2-user]# docker logs 3460619d9414

PLAY [Run default Splunk provisioning] *****************************************
Thursday 22 August 2019  20:51:34 +0000 (0:00:00.026)       0:00:00.026 ******* 

TASK [Gathering Facts] *********************************************************
ok: [localhost]
Thursday 22 August 2019  20:51:36 +0000 (0:00:01.543)       0:00:01.569 ******* 
Thursday 22 August 2019  20:51:36 +0000 (0:00:00.024)       0:00:01.594 ******* 
Thursday 22 August 2019  20:51:36 +0000 (0:00:00.022)       0:00:01.617 ******* 
Thursday 22 August 2019  20:51:36 +0000 (0:00:00.092)       0:00:01.710 ******* 
included: /opt/ansible/roles/splunk_common/tasks/get_facts.yml for localhost
Thursday 22 August 2019  20:51:36 +0000 (0:00:00.041)       0:00:01.751 ******* 

TASK [splunk_common : Set privilege escalation user] ***************************
ok: [localhost]
Thursday 22 August 2019  20:51:36 +0000 (0:00:00.023)       0:00:01.774 ******* 

TASK [splunk_common : Check for existing installation] *************************
ok: [localhost]
Thursday 22 August 2019  20:51:36 +0000 (0:00:00.219)       0:00:01.994 ******* 

TASK [splunk_common : Set splunk install fact] *********************************
ok: [localhost]
Thursday 22 August 2019  20:51:36 +0000 (0:00:00.024)       0:00:02.019 ******* 

TASK [splunk_common : Check for existing splunk secret] ************************
fatal: [localhost]: FAILED! => {"changed": false, "msg": "Permission denied"}

PLAY RECAP *********************************************************************
localhost                  : ok=5    changed=0    unreachable=0    failed=1    skipped=2    rescued=0    ignored=0   

Thursday 22 August 2019  20:51:36 +0000 (0:00:00.086)       0:00:02.105 ******* 
=============================================================================== 
Gathering Facts --------------------------------------------------------- 1.54s
splunk_common : Check for existing installation ------------------------- 0.22s
Provision role ---------------------------------------------------------- 0.09s
splunk_common : Check for existing splunk secret ------------------------ 0.09s
splunk_common : include_tasks ------------------------------------------- 0.04s
Determine captaincy ----------------------------------------------------- 0.02s
splunk_common : Set splunk install fact --------------------------------- 0.02s
splunk_common : Set privilege escalation user --------------------------- 0.02s
Execute pre-setup playbooks --------------------------------------------- 0.02s
splunkd is not running.

@nwang92 Anything that I can do to figure out why this is happening still?

bb03 commented 5 years ago

@johan1252 Can you give us your reproduction steps?

johan1252 commented 5 years ago

@bb03 Stopped container running splunk:7.2.4. I launched the new 7.3.1 docker container like so, where /splunk-data was the same volume used with old container running 7.2.4.

docker run -d -p 8000:8000 -p 9997:9997 -e 'SPLUNK_START_ARGS=--accept-license --answer-yes' -e 'SPLUNK_PASSWORD=<admin password>' -v /splunk-data:/opt/splunk/etc -v /splunk-data:/opt/splunk/var splunk/splunk:7.3.1

Then to get the above log output using docker logs

johan1252 commented 5 years ago

Was able to get around the issue by doing chmod 777 auth/ ; chmod 777 auth/splunk.secret.

stephenmuss commented 5 years ago

I've also had this issue upgrading from 7.2.4 to anything above 7.2.4.

Using Kubernetes statefulsets. After updating the image tag in the statefulsets and redeploying the recreated pods immediately fail with this error.

I'm also not very comfortable with the workaround above with chmod 777.

Currently the /opt/splunk/etc/auth directory looks like this

drwx------.  7 splunk splunk  4096 Feb  6  2019 auth

The /opt/splunk/etc/auth/splunk.secret looks like this.

-r--------. 1 splunk splunk  255 Mar  4  2019 splunk.secret

So far as I can tell this should be enough for the Splunk user that ansible assumes to be able to check for the file's existence. However, I continue to see the error

TASK [splunk_common : Check for existing splunk secret] ************************                                                                                                                         
fatal: [localhost]: FAILED! => {                                                                                                                                                                         
    "changed": false                                                                                                                                                                                     
}                                                                                                                                                                                                        

MSG:                                                                                                                                                                                                     

Permission denied
dmpopoff commented 4 years ago

the issue was reproduced then run 8.0.0 over 7.2.4 (stop 7.2.4, rm 7.2.4 container, run 8.0.0) error is the same. chmod 777 to auth folder and splunk.secret file helps to workaround but I think this is not appropriate.

here is the error message: Wednesday 27 November 2019 12:13:25 +0300 (0:00:00.042) 0:00:03.629 TASK [splunk_common : Check for existing splunk secret] **** fatal: [localhost]: FAILED! => { "changed": false } MSG: Permission denied PLAY RECAP ***** localhost : ok=5 changed=0 unreachable=0 failed=1 skipped=2 rescued=0 ignored=0

dmpopoff commented 4 years ago

Why issue is marked as "Closed"? Where is no decision, only workaround. @nwang92 @bb03 @jmeixensperger What are right permissions should be?

jonnyyu commented 4 years ago

the fix mentioned above was overridden by another change to use 'splunk.user'

https://github.com/splunk/splunk-ansible/commit/4ab2caa4339457211f45704e5b82cfeb6dd8c9af

but the second change actually cause problem. In old splunk image, user id of splunk is 999. But in new splunk image, the ansible user id is 999, the splunk user id is 41812.

nwang92 commented 4 years ago

Sorry I seem to have stopped getting emails about this issue until the last comment. Let me take a look at the 7.2.x to 8.x upgrade path.

Regarding @jonnyyu, the old Splunk image is not compatible with the new Splunk image: https://github.com/splunk/docker-splunk/blob/develop/docs/INTRODUCTION.md#history

nwang92 commented 4 years ago

Is the issue that the container fails to recreate, or is the problem with the Splunk behavior once the image has been updated?

Asking because I'm seeing the case of the latter right now. The docker image itself ships with an Enterprise trial license, which does seem to expire after a certain amount of time. I don't quite know how it checks or when the expiry is scheduled to occur because it doesn't seem very consistent.

Using this command:

docker run -d -p 8000:8000 -e SPLUNK_START_ARGS=--accept-license -e SPLUNK_PASSWORD=helloworld splunk/splunk:<tag>

I started Splunk using the following tags:

Of the above tags, the trial license you get upon first-time-run is already expired. The only way to work around this would be to acquire a new, valid Splunk license and replace it, or convert the installation to using Splunk Free.

Trial license expiration date per tag:

cc @mikedickey - I was under the impression that each time you start the container image you get a new trial, but that looks like that's not the case (maybe for good reason). I'm planning on documenting this at the very least, but is there anything you recommend we do to mitigate users who are affected by this?

mikedickey commented 4 years ago

@nwang92 As we discovered today, this is not a problem related to containers but rather how the Splunk Enterprise trial license works. The same problem exists for all deployment methods.

Summary is that all Splunk Enterprise trial licenses expire on a specific hard-coded date which is included in the original package files. March 8, 2020 happened to be one of those dates. Fix is to always use the latest version, or provide your own license.

Note that 7.3.4, as well as 8.0.x and all newer maintenance releases, do not have this problem (or at least, it is postponed until 2029). It says expiration of 5/8 because that is 60 days from today.

ykou-splunk commented 3 years ago

I ran into the same error message Permission denied, but the issue went away after I ran docker-compose down then run docker-compose up --build again.

I'm not a docker expert, can't explain why it wored.

mikurth commented 3 years ago

Solved the problem on my end by creating the user splunk with id 41812 on ubuntu where i run docker on. I was not able to run splunk with a different user in docker other then root. environment: SPLUNK_USER: splunk