Closed jzakrzeski closed 3 years ago
We've also felt this issue.
Even when running tsagent setup --deploy-key xxx
via User Data at boot time, the agent is already enabled and failing, so an additional call to systemctl restart threatstack
is required.
It would be nice to align this role with clear guidance as to how the docs say things should be done https://threatstack.zendesk.com/hc/en-us/articles/204289149-Steps-for-Deploying-the-Threat-Stack-Agent-via-Amazon-AMI-s
@s01ipsist Hi! Sorry for the lack of response.
I reviewed the docs you linked to. Would this issue also be solved for you if we generally recommended to disable/stop the agent service during the AMI creation process, after install?
Additionally, would an "I am creating an AMI" flag on the role help out? Do you run the ansible role during the AMI creation process? My concern is adding extra complexity to the role to try and differentiate "agent service running & failing because it is a first time run from an AMI" vs "agent service running (and may or may not be failing) in normal operation". We wouldn't want to restart the agent every execution of the role if it sees the service running, or necessarily every time it sees a failure.
Our workaround is to explicitly disable and stop the service before we configure it in our cloudinit script.
threatstack_configure_agent: false
will still set the service to enabled which will cause failure on a reboot as the service can't run without being configured. If I set this value to false I am taking on responsibility for enabling the service so I would expect the service to be disabled.
- name: Stop and disable ThreatStack if not configured
become: true
service:
name: threatstack
state: stopped
enabled: no
when:
- threatstack_configure_agent == false
If that's too big a leap, you could add a variable that allows explicit control of the service state.
- name: Enable/Disable ThreatStack
become: true
service:
name: threatstack
enabled: "{{ threatstack_service_enabled }}"
I don't think the role needs to know about the intentions if it gives suitable options to control the expectations of state.
Thanks for the feedback! I tried out your first option last night.
That said, many users want to always ensure the agent running, automatically (hence the enablement of the service in the install scripts). I am sensitive to not breaking their workflow too. And I could see people using these settings to get into a state where the agent is restarting every ansible run.
Is your goal to use the ansible role during operation of an instance as well? Do you configure the instance's agent via the role after the instance starts up? Do you expect to pass different arguments during the AMI creation vs instance creation/operation?
I can add the separate toggle, but it also seems to me that explicitly disabling the agent outside of the role during AMI creation itself is a cleaner solution, provided we document this recommendation on our side.
We only use Ansible during ami baking. We use a tiny bash script to pull the key from SSM and enable agent during cloud init. We implemented the workaround 6 months ago. Documenting how the role is expected to work would be an improvement
@s01ipsist I added a separate flag named threatstack_agent_disable_service
. Will merge after some additional testing.
Using latest published version of this role, I noticed that despite setting
threatstack_configure_agent: false
, the threatstack service is being enabled when I just want it to be installed in a base image that I derive from to build other things. This role is respecting that variable'sfalse
value, but the actual package install is executing a SysV script to enable the service, which in turn (on Ubuntu 18.04) causes SystemD to enable and attempt to start the service on next boot.We have gotten around this by wrapping
threatstack.threatstack-ansible
role and adding a task after execution that disables the agent in SystemD, but this is obviously not intended to be the case when using this role.