Open dbwiddis opened 9 months ago
This looks great 👍 I would also add here for future support a transition condition - a minimal state machine to allow minimal conditions before transitioning from one step to the other... @dbwiddis - does this make sense ?
One more question - can we parameterize the workflow so that it could be invoked with different parameters (similar to the saved search API) ?
For example a step can accept parameters such as the following index_name
:
"parameters": [
"index_name",
"rollup_name",
"current_time",
"start_time"
],
"steps": [
{
"name": "add_timestamp_field_to_otel_spans_index",
"method": "PUT",
"endpoint": "/${index_name}",
"body": { ... }
...
This looks great 👍 I would also add here for future support a transition condition - a minimal state machine to allow minimal conditions before transitioning from one step to the other...
Each step uses an ActionFuture which it completes when it's complete. If there are cases where we may need to retry, we can use a conditional to define whether that happens. Would be helpful if you gave a specific example of an API call I can consider for that.
One more question - can we parameterize the workflow so that it could be invoked with different parameters (similar to the saved search API) ?
For example a step can accept parameters such as the following
index_name
:"parameters": [ "index_name", "rollup_name", "current_time", "start_time" ], "steps": [ { "name": "add_timestamp_field_to_otel_spans_index", "method": "PUT", "endpoint": "/${index_name}", "body": { ... } ...
This looks like something @amitgalitz proposed in #213. If so I'll address that request first.
Adding arbitrary external RESTful calls into Search Pipeline sounds quite risky. It provides better flexibility but may cause performance, security, and stability uncertainty. Is this really a good tradeoff for a core feature like Search Pipeline? or similar capability could be provided through other mechanism, like Agent Framework?
As I've been looking into this, I have been trying to envision how to address the security concerns.
One of the main use cases to use this was to support https://github.com/opensearch-project/observability/issues/1805 but I think we can still address that with individual workflow steps (update index, reindex, and a rollup step). I'll work together with @YANG-DB to try to implement the specific requirements.
@xinlamzn the general idea is to allow assembly of core API calls that will allow workflow use case to emerge. Observability service analysis is a classic workflow that involves calling several core APIs and connecting these calls in a flow ... Security wise - once limiting the API calls to core only will remove the risk In the future we can also think about allowing generic arbitrary remote calls...
Can we change the definition of this PR and only allow calling of Core API rather then any api ?
In practice expending this PR into a general core API workflow . @dbwiddis ?
Can we change the definition of this PR and only allow calling of Core API rather then any api ?
In practice expending this PR into a general core API workflow . @dbwiddis ?
Yes, @YANG-DB that was what I implied with this comment: https://github.com/opensearch-project/flow-framework/issues/522#issuecomment-2019348671
We have a create index step added in 2.13 and plan to have a reindex step in 2.14. What other core APIs do you need? Can you contribute them (we can help, there's plenty of examples and even a tutorial to follow)?
Seems something overlaps with connector framework. A connector is something owns the connections to any external APIs. It has predict action. We have plan to enhance it to support more action types.
Flow Framework allows sequencing of API calls, but these are presently limited to a fixed set of implemented steps. We can empower customers to perform automation and configuration using any OpenSearch or plugin API, or even calling external REST APIs.
Is your feature request related to a problem?
A workflow structure nearly identical to this plugin was proposed in this RFC for the Observability plugin: https://github.com/opensearch-project/observability/issues/1805
An example workflow step for an API call is included in the linked RFC:
What solution would you like?
Add Workflow Steps which enable calling REST APIs.
The code already exists in our integration test classes, which could be refactored into appropriate steps in the main source tree: https://github.com/opensearch-project/flow-framework/blob/13f672e1f210473802b292204bdd558963c9b871/src/test/java/org/opensearch/flowframework/TestHelpers.java#L81-L90
This is called with parameters like this: https://github.com/opensearch-project/flow-framework/blob/13f672e1f210473802b292204bdd558963c9b871/src/test/java/org/opensearch/flowframework/FlowFrameworkRestTestCase.java#L135-L143
I think the best approach here is to write new steps:
InitHttpClientStep
which creates or configures an appropriateclient()
to make the call. This would be set up very similar to how theToolStep
is created, using the workflow parameters to configure aClient
object. https://github.com/opensearch-project/flow-framework/pull/530RestApiStep
which would reference the client in its previous node inputs, and take other params for the actual REST call.One other consideration is the status API. We keep track of provisioned resources using step name, step id, resource type, and resource id. The linked RFC suggests intermediate status output with "complete" where we'd really want to identify what was done (hard to do generically so maybe we just put "complete") and for in-process steps, running tasks. We could consider that updated provisioning detail in a separate feature request.
We also need supporting code:
What alternatives have you considered?
We could write workflow steps for every OpenSearch API (or at least the most common ones) and use the associated Transport Actions.
Do you have any additional context?
While making REST calls is less efficient than the transport calls, they enable calls outside of OpenSearch as well.
REST client calls may also be needed for future migration to serverless.
While relatively easy to do this with an HTTP Client, supporting HTTPS and preserving appropriate headers may introduce some security complexities.