operator-framework / ansible-operator-plugins

Experimental extraction/refactoring of the Operator SDK's ansible operator plugin
Apache License 2.0
7 stars 17 forks source link

Ansible operator: skip execution on init #28

Open kvaps opened 4 years ago

kvaps commented 4 years ago

Feature Request

Problem

Every task execution is taking about 3s, even just simple playbook:

- hosts: localhost
  gather_facts: no
  tasks:
  - import_role:
      name: "memcached"

Eg if you have 100 custom resources, and then you restart operator, it will be able to process new operations only after 5 minutes, after processing all 100 existing resources.

Solution

Allow specifying preserveStatus: true option on a par with reconcilePeriod: 0 in watches.yaml file:

- version: v1alpha1
  group: baz.example.com
  kind: Baz
  playbook: /opt/ansible/baz.yml
  reconcilePeriod: 0
  preserveStatus: true

Processing:

If preserveStatus is set to true, then save metadata.generation to status.generation for each resource.

Initialization:

During start do the check: if metadata.generation equals status.generation, then skip resource processing.

shawn-hurley commented 4 years ago

You could effectively get this now, by turning of dependent watches https://github.com/operator-framework/operator-sdk/blob/master/doc/ansible/dev/dependent_watches.md

WDYT?

kvaps commented 4 years ago

Hi @shawn-hurley, thanks for answer, but I'm not sure that this option can help somehow in this case.

Eg if I have 100 resources, ansible-operator will process all of them during the restart of the operator itself, I want to prevent it somehow, because I'm planning to have a lot of similar resources, and I want to trigger them only on change.

kvaps commented 4 years ago

Working prototype:

- gather_facts: no
  hosts: localhost
  vars:
    res: "{{ vars.values() | selectattr('apiVersion', 'defined') | first }}"
    metadata: "{{ res.metadata }}"
    api_version: "{{ res.apiVersion }}"
    kind: "{{ res.kind }}"
    status: "{{ res.status | default({}) }}"
  tasks:

  - meta: end_play
    when: "status.generation is defined and status.generation|int == metadata.generation|int"

  - debug:
      msg: do_something

  - k8s_status:
      api_version: '{{ api_version }}'
      kind: '{{ kind }}'
      name: '{{ meta.name }}'
      status:
        generation: "{{ metadata.generation }}"
fabianvf commented 4 years ago

I think this is likely not something we'll add, because there's no guarantee that the last reconciliation finished/succeeded or that cluster state has remained static since the operator restarted, so we want to ensure we reconcile on start. This may be a good example of where a hybrid operator could be the right pattern, since you could just override the reconcile logic to take resourceVersion into account and then reuse everything else.

That being said the overhead is definitely a problem, bumping up the number of parallel jobs (as raised in operator-framework/operator-sdk#1678) and disabling fact gathering (as raised in operator-framework/operator-sdk#1677) should help bring that down, but if the performance is still unsatisfactory we should definitely focus some time on profiling/improving that.

kvaps commented 4 years ago

Maybe can we prioritize it somehow, to put dropping events above initial and reconcile tasks?

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kvaps commented 4 years ago

This issue might be solved by implementing watch bookmarks support https://github.com/operator-framework/operator-sdk/issues/1939

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

kvaps commented 4 years ago

/remove-lifecycle stale

fabianvf commented 4 years ago

Watch bookmarks do look promising, if we can do this in a supported way then I'm all for it

estroz commented 4 years ago

Closing in favor of operator-framework/operator-sdk#1939, which solves the issue described here in another manner.

mhrivnak commented 4 years ago

I think should be re-opened.

Bookmarks are only useful when re-establishing a watch. That would not be the case when the operator container is first starting up. On startup, it always establishes new watches, because nothing is persisting a collection's resource version.

https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks

Further, bookmarks are only useful when you are watching with a label selector, which I don't think is supported yet by the sdk anyway.

openshift-bot commented 4 years ago

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot commented 4 years ago

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

fabianvf commented 3 years ago

/lifecycle frozen