Installing 2.x versions of threatstack agent does not truly respect threatstack_agent_configure variable

threatstack / threatstack-ansible

Ansible for installing Threatstack Agent

https://www.threatstack.com

MIT License

23 stars 16 forks source link

Installing 2.x versions of threatstack agent does not truly respect threatstack_agent_configure variable #58

Closed jzakrzeski closed 3 years ago

jzakrzeski commented 5 years ago

Using latest published version of this role, I noticed that despite setting threatstack_configure_agent: false, the threatstack service is being enabled when I just want it to be installed in a base image that I derive from to build other things. This role is respecting that variable's false value, but the actual package install is executing a SysV script to enable the service, which in turn (on Ubuntu 18.04) causes SystemD to enable and attempt to start the service on next boot.

We have gotten around this by wrapping threatstack.threatstack-ansible role and adding a task after execution that disables the agent in SystemD, but this is obviously not intended to be the case when using this role.

s01ipsist commented 4 years ago

We've also felt this issue.

Even when running tsagent setup --deploy-key xxx via User Data at boot time, the agent is already enabled and failing, so an additional call to systemctl restart threatstack is required.

It would be nice to align this role with clear guidance as to how the docs say things should be done https://threatstack.zendesk.com/hc/en-us/articles/204289149-Steps-for-Deploying-the-Threat-Stack-Agent-via-Amazon-AMI-s

olhado commented 4 years ago

@s01ipsist Hi! Sorry for the lack of response.

I reviewed the docs you linked to. Would this issue also be solved for you if we generally recommended to disable/stop the agent service during the AMI creation process, after install?

Additionally, would an "I am creating an AMI" flag on the role help out? Do you run the ansible role during the AMI creation process? My concern is adding extra complexity to the role to try and differentiate "agent service running & failing because it is a first time run from an AMI" vs "agent service running (and may or may not be failing) in normal operation". We wouldn't want to restart the agent every execution of the role if it sees the service running, or necessarily every time it sees a failure.

s01ipsist commented 4 years ago

Our workaround is to explicitly disable and stop the service before we configure it in our cloudinit script.

threatstack_configure_agent: false will still set the service to enabled which will cause failure on a reboot as the service can't run without being configured. If I set this value to false I am taking on responsibility for enabling the service so I would expect the service to be disabled.

- name: Stop and disable ThreatStack if not configured
  become: true
  service:
    name: threatstack
    state: stopped
    enabled: no
  when:
    - threatstack_configure_agent == false

If that's too big a leap, you could add a variable that allows explicit control of the service state.

- name: Enable/Disable ThreatStack
  become: true
  service:
    name: threatstack
    enabled: "{{ threatstack_service_enabled }}"

I don't think the role needs to know about the intentions if it gives suitable options to control the expectations of state.

olhado commented 3 years ago

Thanks for the feedback! I tried out your first option last night.

That said, many users want to always ensure the agent running, automatically (hence the enablement of the service in the install scripts). I am sensitive to not breaking their workflow too. And I could see people using these settings to get into a state where the agent is restarting every ansible run.

Is your goal to use the ansible role during operation of an instance as well? Do you configure the instance's agent via the role after the instance starts up? Do you expect to pass different arguments during the AMI creation vs instance creation/operation?

I can add the separate toggle, but it also seems to me that explicitly disabling the agent outside of the role during AMI creation itself is a cleaner solution, provided we document this recommendation on our side.

s01ipsist commented 3 years ago

We only use Ansible during ami baking. We use a tiny bash script to pull the key from SSM and enable agent during cloud init. We implemented the workaround 6 months ago. Documenting how the role is expected to work would be an improvement

olhado commented 3 years ago

@s01ipsist I added a separate flag named threatstack_agent_disable_service. Will merge after some additional testing.