nuvla / api-server

Nuvla API server packaged as a Docker container
https://sixsq.com
Apache License 2.0
5 stars 2 forks source link

re-design events resource and mechanisms for registering new events #408

Open cjdcordeiro opened 3 years ago

cjdcordeiro commented 3 years ago

the events resource has a very simple schema and it is being underused.

Starting by the NuvlaBox, all relating actions (i.e. activate, commission, decommission, + custom actions , etc.) should generate an event, to be tracked in the NB detail page

0xbase12 commented 3 years ago

Current Event resource schema:

::timestamp              date-time?
::category               #{"state" "alarm" "action" "system"}
::severity               #{"critical" "high" "medium" "low"})
::content/resource/href  id?
::content/state          string?

~Changes:~ ~- triggered-by: user? (user causing the event creation)~ ~- parent: id? instead of content/resource/href~ ~- state string opt:string? ~ ~- new-state: opt:string? (only for category "state")~

~Events specification~ ~- Immutable (once created never change, edit not allowed~ ~- Event acl: Event are owned by group/nuvla-admin. Views rights are extracted from the concerned resource.~

Creation conditions:

Creation of Events are implemented server side.

Condition for deleting events:

0xbase12 commented 1 year ago

Suggestion reuse existing event resource in Nuvla

When we should register an event:

To be completed

alebellu commented 1 year ago

Type of events

From my current understanding, we would like to have events to represent both:

Below I analyze only the first kind of events (internal to the api-server). It should be the responsibility of the external components (the job engine, etc.) to signal interesting events (but the api-server can provide a means of doing it, for example by allowing creation of events via http calls).

As-is analysis of event processing in api-server

Categories

One of: ["state" "alarm" "action" "system" "user" "email"]

Severities

One of: ["critical" "high" "medium" "low"]

Event creation

Current way to add an event is by calling sixsq.nuvla.server.resources.event.utils/create-event.

Example of event created:

{:resource-type "event",
 :content {:resource {:href "deployment/847a79a2-993a-443a-bc4e-506d7424e1e9"},
           :state "deployment/847a79a2-993a-443a-bc4e-506d7424e1e9 created"},
 :severity "medium",
 :category "action",
 :timestamp "2023-07-24T13:37:55.464Z",
 :acl {:owners ["user/jane"]}}

which is created from this creation payload:

{:params {:resource-name "event"},
 :body {:resource-type "event",
        :content {:resource {:href "deployment/847a79a2-993a-443a-bc4e-506d7424e1e9"},
                  :state "deployment/847a79a2-993a-443a-bc4e-506d7424e1e9 created"},
        :severity "medium",
        :category "action",
        :timestamp "2023-07-24T13:37:55.464Z",
        :acl {:owners ["user/jane"]}},
 :nuvla/authn {:user-id "internal",
               :active-claim "group/nuvla-admin",
               :claims #{"group/nuvla-admin" "group/nuvla-anon" "group/nuvla-user"}}}
`sixsq.nuvla.server.resources.event.utils/create-event` is currently called in: - `sixsq.nuvla.server.resources.deployment/create-deployment` - Call: `(event-utils/create-event deployment-id (get-in create-response [:body :message]) (a/default-acl (auth/current-authentication request)))` In turn called in: - method `crud/add` - method `crud/do-action [resource-type "clone"]` - `sixsq.nuvla.server.resources.deployment-parameter` method `crud/add` - Call: `(event-utils/create-event (:parent resource) value acl :severity "medium" :category "state")` - `sixsq.nuvla.server.resources.deployment-set/create-job` - Call: `(event-utils/create-event (:id resource) action (a/default-acl (auth/current-authentication request)))` In turn called in: - method `crud/add` - method `crud/do-action [resource-type utils/action-create]` - method `crud/do-action [resource-type utils/action-start]` - `sixsq.nuvla.server.resources.deployment-set/action-bulk` - Call: `(event-utils/create-event (str resource-type "/" uuid) action (a/default-acl (auth/current-authentication request)))` In turn called in: - method `crud/do-action [resource-type utils/action-start]` - method `crud/do-action [resource-type utils/action-stop]` - `sixsq.nuvla.server.resources.infrastructure-service/post-delete-hooks` - Call: `(event-utils/create-event (str resource-type "/" uuid) "DELETED" (a/default-acl (auth/current-authentication request)) :severity "low" :category "state")` - `sixsq.nuvla.server.resources.infrastructure-service/event-state-change` - Call: `(event-utils/create-event id new-state (a/default-acl (auth/current-authentication request)) :severity "low" :category "state")` In turn called in: - `sixsq.nuvla.server.resources.infrastructure-service` method `crud/edit` - `sixsq.nuvla.server.resources.infrastructure-service-coe/create-job` - In turn called by `crud/do-action` - `sixsq.nuvla.server.resources.infrastructure-service-coe` method `sixsq.nuvla.server.resources.infrastructure-service/post-add-hook` - Call: `(event-utils/create-event id "STARTING" (a/default-acl (auth/current-authentication request)) :severity "low" :category "state")` - `sixsq.nuvla.server.resources.infrastructure-service-coe/create-job` - Call: `(event-utils/create-event (:id resource) (format "created job %s with id %s" job-name job-id) (a/default-acl (auth/current-authentication request)))` - `sixsq.nuvla.server.resources.infrastructure-service-generic` method `infra-service/post-add-hook` - Call: `(event-utils/create-event id (:state service) (a/default-acl (auth/current-authentication request)) :severity "low" :category category)` - `sixsq.nuvla.server.resources.nuvlabox` method `decommission-async` - Call: `(event-utils/create-event id (str "decommissioning " id " with async " job-id) (:acl request))` In turn called in: - method `crud/do-action [resource-type "decommission"]` - `sixsq.nuvla.server.resources.nuvlabox/check-api` - Call: ``(event-utils/create-event (:id resource) (str "checking the API for NuvlaBox " id " with async " job-id) (:acl resource))`` In turn called in: - method `crud/do-action [resource-type "check-api"]` - `sixsq.nuvla.server.resources.nuvlabox/reboot` - Call: `(event-utils/create-event (:id resource) (str "sending reboot request to NuvlaBox " id " with async " job-id) (:acl resource))` In turn called in: - method `crud/do-action [resource-type "reboot"]` - `sixsq.nuvla.server.resources.nuvlabox/cluster-nuvlabox` - Call: `(event-utils/create-event (:id resource) (str "running cluster action " cluster-action " on NuvlaBox " (:id resource) ", with async " job-id) (:acl resource))` In turn called in: - method `crud/do-action [resource-type "cluster-nuvlabox"]` - `sixsq.nuvla.server.resources.nuvlabox/add-ssh-key` - Call: `(event-utils/create-event (:id resource) (str "asking NuvlaBox " (:id resource) " to add SSH key " cred-id " with async " job-id) (:acl resource))` In turn called in: - method `crud/do-action [resource-type "add-ssh-key"]` - `sixsq.nuvla.server.resources.nuvlabox/revoke-ssh-key` - Call: `(event-utils/create-event (:id resource) (str "removing SSH key " (-> request :body :credential) " from NuvlaBox " id " with async " job-id) (:acl resource))` In turn called in: - method `crud/do-action [resource-type "revoke-ssh-key"]` - `sixsq.nuvla.server.resources.nuvlabox/update-nuvlabox` - Call: `(event-utils/create-event (:id resource) (str "updating NuvlaBox " id " with target release " nb-release-id", with async " job-id) acl)` In turn called in: - method `crud/do-action [resource-type "update-nuvlabox"]`

Actions

Sometimes actions on a resource trigger crud operations on other resources, for example deployment-set create triggers crud/add on a job resource. Other times an action does not trigger any crud operation, e.g deployment-set plan. And in other cases an action can have impact on the resource it is being called upon, and on other resources as well; e.g deplyment-set start edits the deployment-set resource and can create a new job as well.

Events retention

Events are retained in ES for 1 year, then they are deleted.

Summary

Goals

Non-goals

Proposal

Event spec

[UPDATE 02/08/2023] In the first iteration I tried to get rid of the :content top-level key, to see what happens.

Resulting top-level spec would be something like:

(s/def ::resource
  (-> (st/spec (su/only-keys :req-un [::href]
                             :opt-un [::common/resource-type]))))

  (s/def ::schema
    (su/only-keys-maps common/common-attrs
                   {:req-un [::timestamp
                             ::event-type
                             ::message
                             ::content  {:req-un [::resourcce {:req-un [::type, ::href]}
                                                  ::payload]}
                             ::category
                             ::severity
                             ::user-id
                             ::active-claim]
                    :opt-un [::session-id]}))

Acl

Set acl in such a way that the event is visible for the user/group that created it.

Something like this ?

{:owners      ["group/nuvla-admin"]
 :view-meta [(auth/current-active-claim request)]}

Is this enough? Not sure yet..

Event config registry

Have a global registry of events configurations, keyed by resource type.

Event creation utility fn

An event creation utility fn should be added; use sixsq.nuvla.server.resources.event.utils/create-event as starting point. The utility fn should:

CRUD operations

Make the generation of events more systematic, by plugging-in the event generation as part of the crud operations. Have sane defaults, but give possibility to override by resource type (and maybe by event type?). Modify the standard crud implementation sixsq.nuvla.server.resources.common.std-crud for add, edit and delete operations to also call the event creation utility fn.

Actions

It would be good to make eventing more systematic also for actions other then crud.

Proposal is to:

Bulk actions

Single or multiple (one per resource) events or both for bulk actions ?

I would go for the third option of registering both a top level event and one event per resource linked to the top level one, for maximum traceability. For bulk actions impacting hundreds or thousands of resources this would be inefficient: if this happens to be the case we should make the policy configurable via the event config registry.

Async actions

When an async action is triggered via the api-server, an event can be generated to signal that the processing has started, but it should be made clear that the action is not finished. And it should be the responsibility of the action runner/job engine/external system to create an event once the action is completed. Those events could be linked though, via the parent and/or correlation-id properties.

Direct event creation

Regardless of how generic we make the above mechanism, there will be cases where it does not fit. In these cases it might make sense to create events directly, by-passing the automatic creation and the resource event registry. There should not be too many such cases though, otherwise the utility of the generic method above is in question.

Events searching and filtering

Given the above, it should be possible for a user with the proper permissions to:

Notifications

TODO: clarify how notifications work in Nuvla. As far as I can tell, there is currently no link between notifications and events. It might make sense to have an optional event-href property pointing to the an event to which a notification might refer.

Remove all present calls to create-event

Configure the registry in such a way that all current events are still logged, with all the information that is currently logged.

Immutability

Events should be immutable, edit and delete should not be allowed (at least for standard user roles).

Events retention

Events are retained in ES for 1 year, then they are deleted. The current retention period seems reasonable, but it would be good to re-evaluate once many more events are being generated, based on the load on ES.

Worth also considering: instead of deleting old events, we might want to zip them up and send them to S3 or similar cheap service for cold storage, and have a much longer retention period in that form.

Migration of existing events

Note on consistency

With the proposal above, we can think of a couple of undesirable scenarios:

Ideally events and the actions they represent should be committed in an atomic transaction. Elastic Search, which is where resources are currently being stored, does not support atomic transactions. Also, some actions do not interact with Elastic Search, so making them atomic with the event creation would be challenging anyway.

Possible approaches to the above issue coming to mind:

  1. Perform the action first, and only if the action succeeds, then try to commit the event. If commit of the event fails then try a few more times with exponential backoff; if that also fails then log a SEVERE error in the logs.

    Pros:

    • simple to implement
    • low impact on existing logic

    Cons:

    • does not insulate from a loss of power event, or ES failure, or network failure, occurring between the end of the action and the storing of the event: in that case we could have an action being successfully performed without a corresponding event logged anywhere in the system.
    • requires an active monitoring of the logs to be effective (SEVERE errors should be notified to admins)
  2. Commit the event first, and only if that succeeds then perform the action. If there is a failure while performing the action, we might think of deleting the event: but the event might have been consumed by other parties in the meantime, so that should not be considered as an option. An alternative could be to emit a compensating event, basically erasing the validity of the previous event.

    Pros:

    • simple to implement

    Cons:

    • there is still much potential for inconsistencies in case of unrecoverable failures in between any of the 3 phases (event commit - action - compensating event commit)
    • handling with compensating events might be very complex, and this complexity is shifted here to downstream event consumers, which arguably should not be bothered with this kind of issues
  3. Commit the event first and perform the action asynchronously, with a retry mechanism. Most interactions with the system are made via synchronous http calls, which are expected to return only once the action is completed. As such, this approach would require changes to CRUD operations and other actions to wait for the asynchronous result to be ready, before responding to the client.

    Pros:

    • does insulate from a loss of power event, even if it occurs between the storing of the event and the end of the action

    Cons:

    • consistency is "eventual": there is a period of time in which there is an event in the system saying that an action has been performed, while that has not happened yet.
    • implementation complexity: this approach would be in the direction of building an event driven architecture, designed for asynchronous processing and eventual consistency. Which can have some advantages but at the cost of much increased complexity, because every component of the system should be designed taking the eventual consistency principle into account.
  4. Other ?

I have not considered the option of issuing compensating actions in case of failures in storing events, because it seems unreasonable for example to start an app deployment only to cancel it a few ms later, only because it was not possible to log the corresponding event.

Of the options above, option 1. seems to me like the best compromise for the Nuvla system at the moment. Maybe a solution like 3., or a similar form of event driven architecture, could be considered in the future?

0xbase12 commented 1 year ago

To keep a fully re-playble history of events as in event sourcing.

This is an interesting proposition to be able to trace events. But I would suggest to add it in a second phase.

0xbase12 commented 1 year ago

Event config registry

I advise to make it a defmulti to get the config event for each resource. Because we load some resources dynamically from other projects jar. If the config of the resource is defined in each resource namespace, it will be loaded also dynamically at runtime. I think it’s better than having to configure resources that will maybe be loaded later. The other solution is to have function that each resource call at initialization that add event config to the atom registry. But I personally I prefer the defmulti.

0xbase12 commented 1 year ago

Maybe move it under content ? Clarify how content is used in general.

what is the advantage of nesting instead keeping a flat document

0xbase12 commented 1 year ago

Set acl in such a way that the event is visible for the user/group that created it.

User need at least view-data rights to be able to see the content of the event. (view-meta right give access only to name description common attributes) I'm thinking if it's better to give view-data on event to all users that are able to see the content of the resource that is triggering the event. (e.g. nuvlabox resource user/a action call on it should be also visible to all users that can see this nuvlabox

0xbase12 commented 1 year ago

Migration of existing events

• how to migrate ? We can create a migration script from old schema to the new one. But we can do it at the very end. Do not bother with that at this step. • impacts on ui or other components ? who is making use of events? UI / notification system / api-server-private • is any customer making use of events already? Customers can see events in UI. I think that’s all usage that is being done from events by customers.

0xbase12 commented 1 year ago

Note on consistency. Possible approaches to the above issue coming to mind:

  1. Other suggestion • Commit the event first, if event creation fail log it but do not stop the original request. • If there is a failure while performing the action, we don’t delete the event, we create a new event that state action failure.
0xbase12 commented 1 year ago

payload: field to store arbitrary data for the event.

What is the spec definition for this field? Because ES doesn't play well with open maps.

alebellu commented 1 year ago

Event config registry

I advise to make it a defmulti to get the config event for each resource. Because we load some resources dynamically from other projects jar. If the config of the resource is defined in each resource namespace, it will be loaded also dynamically at runtime. I think it’s better than having to configure resources that will maybe be loaded later. The other solution is to have function that each resource call at initialization that add event config to the atom registry. But I personally I prefer the defmulti.

Makes sense. Will change it to defmulti, thanks.

alebellu commented 1 year ago

• impacts on ui or other components ? who is making use of events? UI / notification system / api-server-private

Ok, then given all these impacted systems, I guess it is better if the event spec is backward-compatible ? At least until all other components have migrated.

alebellu commented 1 year ago

payload: field to store arbitrary data for the event.

What is the spec definition for this field? Because ES doesn't play well with open maps.

I took the spec of ::callback/data as blueprint:

(s/def ::payload
  (-> (st/spec (su/constrained-map keyword? any?))
      (assoc :name "payload"
             :json-schema/type "map"
             :json-schema/description "event payload"
             :json-schema/indexed false)))

The idea would be that this field is a dump of whatever might be useful, but should not be indexed by ES. If we need additional info to be indexed they should be added to the spec. For example in the prototype I moved the state property inside payload: the idea would be that you search by event-type instead. (but I will put it back for backward compatibility, see comment above).

I didn't see issues with it in my limited tests,do you foresee some problems with the spec above ?

0xbase12 commented 1 year ago

payload: field to store arbitrary data for the event.

What is the spec definition for this field? Because ES doesn't play well with open maps.

I took the spec of ::callback/data as blueprint:

(s/def ::payload
  (-> (st/spec (su/constrained-map keyword? any?))
      (assoc :name "payload"
             :json-schema/type "map"
             :json-schema/description "event payload"
             :json-schema/indexed false)))

The idea would be that this field is a dump of whatever might be useful, but should not be indexed by ES. If we need additional info to be indexed they should be added to the spec. For example in the prototype I moved the state property inside payload: the idea would be that you search by event-type instead. (but I will put it back for backward compatibility, see comment above).

I didn't see issues with it in my limited tests,do you foresee some problems with the spec above ?

Interesting suggestion. We will have to check that ES doesn't count the keys in case we set the field as not indexed.

alebellu commented 1 year ago

Maybe move it under content ? Clarify how content is used in general.

what is the advantage of nesting instead keeping a flat document

Ok, for some reason I thought that the keyword :content was treated in a special way that I was not grasping :-) I would have just 2 properties under content, resource and payload, so I don't see much value in grouping them under :content. On the other hand if we need to add many additional fields (for ES indexing) then I can see the value of doing that.

alebellu commented 1 year ago

ES doesn't count the keys

What do you mean ? Sorry,my understanding of how ES works is pretty basic..

alebellu commented 1 year ago

Note on consistency. Possible approaches to the above issue coming to mind:

  1. Other suggestion • Commit the event first, if event creation fail log it but do not stop the original request. • If there is a failure while performing the action, we don’t delete the event, we create a new event that state action failure.

Ok, in this scenario I would also create an event when the action succeeds. Otherwise if you are listening to events live, you have no way of distinguishing if an action started but is still in progress vs an action started and succeeded.

0xbase12 commented 1 year ago

payload: field to store arbitrary data for the event.

What is the spec definition for this field? Because ES doesn't play well with open maps.

I took the spec of ::callback/data as blueprint:

(s/def ::payload
  (-> (st/spec (su/constrained-map keyword? any?))
      (assoc :name "payload"
             :json-schema/type "map"
             :json-schema/description "event payload"
             :json-schema/indexed false)))

The idea would be that this field is a dump of whatever might be useful, but should not be indexed by ES. If we need additional info to be indexed they should be added to the spec. For example in the prototype I moved the state property inside payload: the idea would be that you search by event-type instead. (but I will put it back for backward compatibility, see comment above). I didn't see issues with it in my limited tests,do you foresee some problems with the spec above ?

Interesting suggestion. We will have to check that ES doesn't count the keys in case we set the field as not indexed.

I tested that we can fit a big number of keys in payload and seems to be working when not indexed in ES. 👏

0xbase12 commented 1 year ago

ES doesn't count the keys

What do you mean ? Sorry,my understanding of how ES works is pretty basic..

I mean that ES limit indexed fields to 1000 keys per index by default.

Here is an example of an error if we index the attribute payload and we register more that 1000 keys over the documents: "Limit of total fields [1000] in index [nuvla-event] has been exceeded"

But this is no more relevant, since when the payload attribute is not indexed ES doesn't account payload nested keys as part of that limit.

alebellu commented 1 year ago

ES doesn't count the keys

What do you mean ? Sorry,my understanding of how ES works is pretty basic..

I mean that ES limit indexed fields to 1000 keys per index by default.

Here is an example of an error if we index the attribute payload and we register more that 1000 keys over the documents: "Limit of total fields [1000] in index [nuvla-event] has been exceeded"

But this is no more relevant, since when the payload attribute is not indexed ES doesn't account payload nested keys as part of that limit.

Interesting, thanks.

alebellu commented 1 year ago

[Update 11/08/2023]

After several code iterations I settled on some design decisions that I try to detail below.

For the implementation and latest changes, see the linked PR.

Event types

All events must have an event type:

An event type is a string identifier, and implicitly a specification for a set of event objects that have the same semantic intent and same structure.

Event types must belong to a category, which is a way of grouping sets of event types with similar or related semantic. They can also have a subcategory.

Several public implementation of event systems use event types. Sometimes it is called event name, sometimes it is called event type. ##### Stripe See https://stripe.com/docs/api/events/types
Examples | Event type | Object | Description | |--------------------------------|---------------------------------|-----------------------------------------------------------------------------------------| | account.application.authorized | data.object is an "application" | Occurs whenever a user authorizes an application. Sent to the related application only. | | charge.expired | data.object is a charge | Occurs whenever an uncaptured charge expires. | | charge.failed | data.object is a charge | Occurs whenever a failed charge attempt occurs. | | charge.pending | data.object is a charge | Occurs whenever a pending charge is created. | | charge.refund.updated | data.object is a refund | Occurs whenever a refund is updated, on selected payment methods. | | charge.refunded | data.object is a charge | Occurs whenever a charge is refunded, including partial refunds. |
##### AWS See https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html
Examples ```json { "version": "0", "id": "cd4d811e-ab12-322b-8255-872ce65b1bc8", "detail-type": "event type", "source": "aws.config", "account": "111122223333", "time": "2018-03-22T00:38:11Z", "region": "us-east-1", "resources": [ resources ], "detail": { specific message type } } ``` ```json { "version":"0", "id":"2f8147ab-8c48-47c6-b0b6-3ee23ec8d300", "detail-type":"EMR Auto Scaling Policy State Change", "source":"aws.emr", "account":"123456789012", "time":"2016-12-16T20:42:44Z", "region":"us-east-1", "resources":[], "detail":{ "resourceId":"ig-X2LBMHTGPCBU", "clusterId":"j-1YONHTCP3YZKC", "state":"PENDING", "message":"AutoScaling policy modified by user request", "scalingResourceType":"INSTANCE_GROUP" } } ``` ```json { "version": "0", "id": "ffd8a6fe-32f8-ef66-c85c-111111111111", "detail-type": "Tag Change on Resource", "source": "aws.tag", "account": "123456789012", "time": "2018-09-18T20:41:06Z", "region": "us-east-1", "resources": [ "arn:aws:ec2:us-east-1:123456789012:instance/i-0000000aaaaaaaaaa" ], "detail": { "changed-tag-keys": [ "key2", "key3" ], "service": "ec2", "resource-type": "instance", "version": 5, "tags": { "key4": "value4", "key1": "value1", "key2": "value2" } } } ```
##### GCP See for example: https://cloud.google.com/eventarc/docs/event-types
Examples - `google.cloud.apigateway.api.v1.created` - `google.cloud.apigateway.api.v1.deleted` - `google.cloud.apigateway.api.v1.updated` - `google.cloud.gkebackup.backup.v1.created` - `google.cloud.gkebackup.backup.v1.deleted` - `google.cloud.gkebackup.backup.v1.updated` - `google.cloud.gkebackup.restore.v1.created` - `google.cloud.gkebackup.restore.v1.deleted` - `google.cloud.gkebackup.restore.v1.updated` - `google.cloud.storage.object.v1.archived` - `google.cloud.storage.object.v1.deleted` - `google.cloud.storage.object.v1.finalized` - `google.cloud.storage.object.v1.metadataUpdated` - `google.cloud.dataflow.job.v1beta3.statusChanged`

In api-server, for every crud operation or action we define the following event types:

For example, the successful update of a nuvlabox resource generates 2 events:

If the operation above fails for some reason, it will generate the following events instead:

Event details

Based on the event type (or its category), an event can contain additional details, which are grouped under the key details.

For example, events of category state must have a key new-state nested under details, and optionally a key old-state.

Event parent

In the example above the 2 events generated from the creation of a nuvlabox resource have no apparent link, other than the resource to which they refer to.

To make the link between the events explicit we use the property parent.

In the examples above the events nuvlabox.update.completed and nuvlabox.update.failed would have the event nuvlabox.update as their parent.

Some actions trigger other actions or operations internally, which in turn generate other events: this situation gives rise to a tree of events, where the root of the tree is the event representing the outermost request.

ACL

The acl on events is set with the following algorithm:

Implementation details

Event manager

CRUD operations on events are redirected to an event manager.

The event manager has a generic interface and the actual implementation can be switched (I used db bindings and payments interface as blueprints for this).

This makes it possible to easily switch from an implementation to another. For now there is only one main implementation (DbEventManager) that stores the events in Elastic Search and sends them to Kafka at the same time. There is also a very simple implementation (NoEventsEventManager) that does not store events.

The event manager also offers some wrapper functions for crud operations. The idea is that crud operations can be triggered from code with events enabled or without events enabled, depending on the situation.

CRUD with events facade

New facade functions to invoke crud operations or actions with events generation enabled:

Per-resource type enabling/disabling

Events can also be enabled or disabled per resource type by redefining the multimethod sixsq.nuvla.events.config/events-enabled?

Also add per-event-type and per-category enabling/disabling of events? It can probably be useful, I would just wait for the first use-case before implementing it.

Other multi methods

Failure mode

After further thinking, I am still a bit reluctant about letting an operation proceed when there is an error storing an event. If later on we want to build other functionalities on top of the event system, we must be able to rely on the fact that every operation/action is accompanied by its corresponding events in the system.

I would do the following:

In any case, it is very important for the event creation code to be made simple and free of logical errors as much as possible: to be clear, it should only fail if the events are malformed, or if ES or the network etc. fails.

Event testing

Since operations and actions on any resource will generate events by default, most of the lifecycle tests can be integrated with checks on the generated events.

I started doing this work for the module, the deployment and the nuvlabox0 resources, but it is a long way from being complete. I added a macro sixsq.nuvla.server.resources.lifecycle-test-utils/are-events to make the testing of events easier: it checks the event type, the resource, and the hierarchy of the events.

Plan for next week

Plan for later

0xbase12 commented 1 year ago

Some enhancement ideas:

konstan commented 11 months ago

Reason for Pending: work on Projects is impeding the progress.