opensearch-project / flow-framework

OpenSearch plugin that enables builders to innovate AI apps on OpenSearch
Apache License 2.0
27 stars 26 forks source link

[META] [FEATURE] Add a WorkflowStep for calling external REST APIs #522

Open dbwiddis opened 5 months ago

dbwiddis commented 5 months ago

Flow Framework allows sequencing of API calls, but these are presently limited to a fixed set of implemented steps. We can empower customers to perform automation and configuration using any OpenSearch or plugin API, or even calling external REST APIs.

Is your feature request related to a problem?

A workflow structure nearly identical to this plugin was proposed in this RFC for the Observability plugin: https://github.com/opensearch-project/observability/issues/1805

This API is designed to orchestrate multiple steps, each corresponding to a separate API call to a core API within OpenSearch. The primary goal of the Workflow API is to facilitate complex data preparation processes involving enrichment, aggregation, reindexing, and other transformations.

By enabling users to define multi-step workflows, this feature aims to simplify the preparation of raw data for visualization, thereby providing deeper insights into system status and health.

An example workflow step for an API call is included in the linked RFC:

"steps": [
{
  "name": "add_timestamp_field_to_otel_spans_index",
  "method": "PUT",
  "endpoint": "/${index_name}",
  "body": { ... }
 },
 ....
]

What solution would you like?

Add Workflow Steps which enable calling REST APIs.

The code already exists in our integration test classes, which could be refactored into appropriate steps in the main source tree: https://github.com/opensearch-project/flow-framework/blob/13f672e1f210473802b292204bdd558963c9b871/src/test/java/org/opensearch/flowframework/TestHelpers.java#L81-L90

This is called with parameters like this: https://github.com/opensearch-project/flow-framework/blob/13f672e1f210473802b292204bdd558963c9b871/src/test/java/org/opensearch/flowframework/FlowFrameworkRestTestCase.java#L135-L143

I think the best approach here is to write new steps:

One other consideration is the status API. We keep track of provisioned resources using step name, step id, resource type, and resource id. The linked RFC suggests intermediate status output with "complete" where we'd really want to identify what was done (hard to do generically so maybe we just put "complete") and for in-process steps, running tasks. We could consider that updated provisioning detail in a separate feature request.

We also need supporting code:

What alternatives have you considered?

We could write workflow steps for every OpenSearch API (or at least the most common ones) and use the associated Transport Actions.

Do you have any additional context?

While making REST calls is less efficient than the transport calls, they enable calls outside of OpenSearch as well.

REST client calls may also be needed for future migration to serverless.

While relatively easy to do this with an HTTP Client, supporting HTTPS and preserving appropriate headers may introduce some security complexities.

YANG-DB commented 5 months ago

This looks great 👍 I would also add here for future support a transition condition - a minimal state machine to allow minimal conditions before transitioning from one step to the other... @dbwiddis - does this make sense ?

YANG-DB commented 5 months ago

One more question - can we parameterize the workflow so that it could be invoked with different parameters (similar to the saved search API) ?

For example a step can accept parameters such as the following index_name:

  "parameters": [
    "index_name",
    "rollup_name",
    "current_time",
    "start_time"
  ],
  "steps": [
    {
      "name": "add_timestamp_field_to_otel_spans_index",
      "method": "PUT",
      "endpoint": "/${index_name}",
      "body": { ... }
  ...
dbwiddis commented 5 months ago

This looks great 👍 I would also add here for future support a transition condition - a minimal state machine to allow minimal conditions before transitioning from one step to the other...

Each step uses an ActionFuture which it completes when it's complete. If there are cases where we may need to retry, we can use a conditional to define whether that happens. Would be helpful if you gave a specific example of an API call I can consider for that.

dbwiddis commented 5 months ago

One more question - can we parameterize the workflow so that it could be invoked with different parameters (similar to the saved search API) ?

For example a step can accept parameters such as the following index_name:

  "parameters": [
    "index_name",
    "rollup_name",
    "current_time",
    "start_time"
  ],
  "steps": [
    {
      "name": "add_timestamp_field_to_otel_spans_index",
      "method": "PUT",
      "endpoint": "/${index_name}",
      "body": { ... }
  ...

This looks like something @amitgalitz proposed in #213. If so I'll address that request first.

xinlamzn commented 4 months ago

Adding arbitrary external RESTful calls into Search Pipeline sounds quite risky. It provides better flexibility but may cause performance, security, and stability uncertainty. Is this really a good tradeoff for a core feature like Search Pipeline? or similar capability could be provided through other mechanism, like Agent Framework?

dbwiddis commented 4 months ago

As I've been looking into this, I have been trying to envision how to address the security concerns.

One of the main use cases to use this was to support https://github.com/opensearch-project/observability/issues/1805 but I think we can still address that with individual workflow steps (update index, reindex, and a rollup step). I'll work together with @YANG-DB to try to implement the specific requirements.

YANG-DB commented 4 months ago

@xinlamzn the general idea is to allow assembly of core API calls that will allow workflow use case to emerge. Observability service analysis is a classic workflow that involves calling several core APIs and connecting these calls in a flow ... Security wise - once limiting the API calls to core only will remove the risk In the future we can also think about allowing generic arbitrary remote calls...

YANG-DB commented 3 months ago

Can we change the definition of this PR and only allow calling of Core API rather then any api ?

In practice expending this PR into a general core API workflow . @dbwiddis ?

dbwiddis commented 3 months ago

Can we change the definition of this PR and only allow calling of Core API rather then any api ?

In practice expending this PR into a general core API workflow . @dbwiddis ?

Yes, @YANG-DB that was what I implied with this comment: https://github.com/opensearch-project/flow-framework/issues/522#issuecomment-2019348671

We have a create index step added in 2.13 and plan to have a reindex step in 2.14. What other core APIs do you need? Can you contribute them (we can help, there's plenty of examples and even a tutorial to follow)?

ylwu-amzn commented 2 months ago

Seems something overlaps with connector framework. A connector is something owns the connections to any external APIs. It has predict action. We have plan to enhance it to support more action types.