threatstack / threatstack-ansible

Ansible for installing Threatstack Agent
https://www.threatstack.com
MIT License
23 stars 17 forks source link

`Unable to start service threatstack: Job for threatstack.service canceled.` #86

Closed jobimrobinsantos-drizly closed 2 years ago

jobimrobinsantos-drizly commented 2 years ago

I noticed that some servers where we had previously installed threatstack via this role were not running the service, so I tried restarting the service manually. When that failed, I tried re-running the playbook, which had the same result as my manual attempt:

Unable to start service threatstack: Job for threatstack.service canceled.

All servers are running Ubuntu 18.04. I am using v5.0.0 of this role.

Here is my playbook (with some redactions):

---
- name: Install threatstack agent on servers
  hosts: all
  roles:
    - role: threatstack.threatstack-ansible
      threatstack_deploy_key: REDACTED
      threatstack_pkg: threatstack-agent
      threatstack_configure_agent: true
      threatstack_agent_disable_service: false
      become: yes

Here is the output of the play:

$ ansible-playbook -i INVENTORY provision_threatstack.yml --limit SERVER

PLAY [Install threatstack agents on server] ****************************************************************************

TASK [Gathering Facts] *************************************************************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Define package URL variable] ***************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Check auditd status] ***********************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Stop service auditd] ***********************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Disable service auditd] ********************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : apt -- Ensure agent dependencies are installed] ********************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : apt -- Add agent repository key] ***********************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : apt -- Add agent repository] ***************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : apt -- Ensure latest agent is installed when no version specified] *************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : apt -- Ensure agent is installed] **********************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : apt -- Ensure agent specified version is installed] ****************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : apt -- Stop and disable agent if not to be configured] *************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : yum -- Ensure agent repo is installed] *****************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : yum -- Add agent repo GPG key] *************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : yum -- Ensure latest agent is installed when no version specified] *************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : yum -- Ensure agent is installed] **********************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : yum -- Ensure agent specified version is installed] ****************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : yum -- Stop and disable agent if not to be configured] *************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Get setup string] **************************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Get checksum of setup string] **************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Get agent registration status] *************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Create file to track checksum of setup string] *********************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Get config string] *************************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Get checksum of config string] *************************************************
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Create file to track checksum of config string] ********************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Ensure ThreatStack is stopped] *************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Agent setup] *******************************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Wait 5 seconds] ****************************************************************
Pausing for 5 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)
ok: [SERVER]

TASK [threatstack.threatstack-ansible : Agent config] ******************************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Restart tsagent] ***************************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Wait 5 seconds] ****************************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Get agent state] ***************************************************************
skipping: [SERVER]

TASK [threatstack.threatstack-ansible : Ensure agent is running and started on boot] ***********************************
fatal: [SERVER]: FAILED! => {"changed": false, "msg": "Unable to start service threatstack: Job for threatstack.service canceled.\n"}

PLAY RECAP *************************************************************************************************************
SERVER : ok=16   changed=0    unreachable=0    failed=1    skipped=18   rescued=0    ignored=0
olhado commented 2 years ago

Hi @jobimrobinsantos-drizly

Could you post what version of the agent you trying to install? I assume latest, but want to make sure.

jobimrobinsantos-drizly commented 2 years ago

I'm setting threatstack_pkg: threatstack-agent so it should be installing threatstack-agent=2*

jobimrobinsantos-drizly commented 2 years ago

It looks like it's actually installing 3.0.0!

$ tsagent --version
tsagent version 3.0.0
$ apt list | grep threatstack
threatstack-agent/bionic,now 3.0.0.0ubuntu18.105 amd64 [installed]
threatstack-agent-support/bionic 1.6.0 all
jobimrobinsantos-drizly commented 2 years ago

I have changed my playbook to say threatstack_pkg: threatstack-agent=2* so that it will not install v3. I think that this part of the role is not doing what it is intended to do: https://github.com/threatstack/threatstack-ansible/blob/0e3c51e6a27c8d9b8d0325d0638fc4ab1d40e09c/tasks/apt_install.yml#L23-L27

olhado commented 2 years ago

Will be looking into it, @jobimrobinsantos-drizly . Will report back what I find. Thanks again for the report!

olhado commented 2 years ago

So locally, running the tests I have with 18.04, I see it installing 2.5.0, and succeeding. Changing that line in tasks/apt_install.yml to threatstack-agent=3* also succeeds in installing the agent.

So a couple of follow ups:

olhado commented 2 years ago

So looking some more on my side, I am getting a changed setup checksum file:

TASK [threatstack-ansible : Create file to track checksum of setup string] *****
changed: [localhost]

Whereas your runs appear to already have a file (it returns ok for you not changed). This is appearing to lead your runs to skip checking if the agent is stopped, and skips running the actual setup command to register the agent. It also means it is skipping the restart of the agent service.

Could you check for a /opt/threatstack/etc/.setup_checksum file, and assuming it is there, the creation/last modified date on it. I am guessing it will be a while ago. If I am correct, then deleting the file and rerunning will likely fix the immediate issue.

jobimrobinsantos-drizly commented 2 years ago

Unfortunately I already charged ahead and redeployed to install tsagent 2.5.0. It should be noted that I had to uninstall it first since the apt task does not have allow_downgrade: true on it. That redeployment resulted in a different checksum.

The output above was from my attempt to reinstall threatstack to fix the failure, so I would not expect the checksum to have changed.

Side note: I've noticed a pattern of threatstack not starting back up after a server has been powered down for >24 hours. This is what I was investigating when I discovered that we had 3.0.0 installed. I'll have more info on this issue soon.

njf5 commented 2 years ago

Hi @jobimrobinsantos-drizly

Regarding the side note you mentioned, that is expected behavior. The agent periodically communicates with our platform in the form of a "heartbeat" message. If the platform no longer receives these messages the agent is revoked from the platform and requires a re-registration.

Powering down servers for greater than 24 hours would lead to this.

Details regarding re-registration can be found here: https://threatstack.zendesk.com/hc/en-us/articles/205868529-Re-register-the-Threat-Stack-Linux-Agent

Happy to help further if you have any other questions/concerns.

olhado commented 2 years ago

To follow up with the issues you noted about this role, I think a flag to allow downgrade install is definitely worth adding. And the role should definitely be deploying 3.0.0 as latest, not the latest 2.X, so that can be fixed too.

I'll leave this ticket open for the fixes.

olhado commented 2 years ago

To follow up with the issues you noted about this role, I think a flag to allow downgrade install is definitely worth adding. And the role should definitely be deploying 3.0.0 as latest, not the latest 2.X, so that can be fixed too.

These issues should now be fixed in #89 .