umccr / orcabus

 🐋 UMCCR Pipeline & Workflow Orchestration
3 stars 0 forks source link

Review and finalise event schema #257

Closed victorskl closed 1 month ago

victorskl commented 2 months ago

Context

Current we are following EventBridge doc to authoring our event schema.

We are also empirically observing from Amazon Console > EventBridge > Schemas > AWS event schema registry on various AWS built-in schemas.

We are also choosing OpenAPI format over JSONSchema as OpenAPI format is observed to be most use format in built-in registry.

These schema has AWSEvent envelope object that wrap the domain event such as StepFunctionsExecutionStatusChange or, our case SequnceRunStateChange and so on.

After the event schema is put into registry, we can download the corresponding code binding through EventBridge Console UI.

Forces

There are few concerns we observed so far. As follows.

CloudEvent

Study CloudEvent spec (https://github.com/cloudevents/spec) which seem to be well thought-out design.

Its SDK seems to be supported more language toolchains. https://github.com/cloudevents/spec?tab=readme-ov-file#sdks

Its primer case study article that perform analysis across cloud events status quo. https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/primer.md

TL;DR example: JSON Event Format

https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/formats/json-format.md

{
    "specversion" : "1.0",
    "type" : "com.example.someevent",
    "source" : "/mycontext",
    "subject": null,
    "id" : "C234-1234-1234",
    "time" : "2018-04-05T17:31:00Z",
    "comexampleextension1" : "value",
    "comexampleothervalue" : 5,
    "datacontenttype" : "application/json",
    "data" : {
        "appinfoA" : "abc",
        "appinfoB" : 123,
        "appinfoC" : true
    }
}

Actions

victorskl commented 2 months ago

Related #225

victorskl commented 1 month ago

Part of this story card activity; the SchemaStack is deployed, our schema registry is ready to use.

For boilerplate code binding for parsing event from main bus channel, you can also download code bindings through VSCode with AWSToolkits extension from schema registry. When possible, this is the recommended approach to use code binder / auto-gen to handle marshalling/unmarshalling the event at your code. You'd be doing either (or both) of subscribing event or, publishing (emit) event into the bus -- as part of your data processing logic.

AWS Toolkit

victorskl commented 1 month ago

Shared the up-to current approach demonstration with team - https://umccr.slack.com/archives/C03ABJTSN7J/p1714731324414679

victorskl commented 1 month ago

We are proceeding with AWS EventBridge style event structure. Won't be doing CloudEvent. FYI @reisingerf

victorskl commented 1 month ago

Reading pointers on designing event and event types articles:

victorskl commented 1 month ago

(closing remark on the this feature story)

With #272 we implemented WorkflowRunStateChange domain event. This event schema share among the services within the Workflow domain world (DDD Bounded Context). The service differentiator marker on events is the source property of the event envelope.


Self note on design:

This is fine as we treat all of Workflow business as one domain aggregate (DDD bounded context). Then, these are all internal events to this bounded context of the Workflow world and, events happening within it. i.e. We know each other what we are dealing with and, events within in this domain.

We can come up with another dedicated schema down the track; when externally observer would like to communicate into Workflow orchestration world and its events.

Services may emit multiple events of similar state changes with differing schemas for differing audience (subscribers) - internal/external parties...

victorskl commented 1 month ago

Reopening the issue. Background context as follows.

At this point of commit ec46d0b; it is found that there is a need of "schema composition" requirement. The "data schema" concept is noted in #308 PR conversation.

It is then also reflected in OrcaBus schema documentation manifest at commit ec46d0b. https://github.com/umccr/orcabus/tree/ec46d0b/docs/schemas#data-schemas


Follow up:

Actions:

victorskl commented 1 month ago

Related discussion and note about "requirement on message conform to schema versioning" https://umccr.slack.com/archives/C03ABJTSN7J/p1716407855291649

Flo is also reaching out to AWS support through UoM management account on this topic.

Meanwhile we will keep it the way as-is for first cut starter. i.e. "Pattern 1: Schema Name is a Version itself".

victorskl commented 1 month ago

Reopening the issue for change request: Refactor to leverage JSON Schema

On 20240524 orcabus meeting; made proposal to team, and discussed to switch to use JSON Schema; to avoid potential confusion over OpenAPIv3 InfoObject Version property caveat - which is ineffective for us.

Furthermore, we have already leveraged JSON Schema in Portal as pilot POC with oncoanalyser orchestration. https://umccr.slack.com/archives/CP356DDCH/p1713250274677059

See more in note orcabus_quick_notes_20240524.txt attached in Slack. https://umccr.slack.com/archives/C03ABJTSN7J/p1716502604511039

victorskl commented 1 month ago

Excerpt from https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-schema-create.html See underscore highlight below.

When you choose between OpenAPI 3 and JSONSchema Draft4 formats, consider the following differences:

  • JSONSchema format supports additional keywords that aren't supported in OpenAPI, such as $schema, additionalItems.
  • There are minor differences in how keywords are handled, such as type and format.
  • OpenAPI doesn't support JSONSchema Hyper-Schema hyperlinks in JSON documents.
  • Tools for OpenAPI tend to focus on build-time, whereas tools for JSONSchema tend to focus on run-time operations, such as client tools for schema validation.

We recommend using JSONSchema format to implement client-side validation so that events sent to EventBridge conform to the schema. You can use JSONSchema to define a contract for valid JSON documents, and then use a JSON schema validator before sending the associated events.

After you have a new schema, you can download code bindings to help create applications for events with that schema.

Justification:

Switching to JSON Schema also compliment the "requirement on message conform to schema versioning"^^. We'd want client of the system (within or external) to be able to verify the message against the schema (contract) using well-known tools.