Closed ohltyler closed 7 months ago
But feels weird to have to run deprovision when it's doing nothing besides reset the state, since there was no created resources to clean up to begin with.
I don't have a good solution for this part off the top of my head. But from a user perspective this is a bit confusing. Worth exploring this edge case from usability standpoint
See https://github.com/opensearch-project/flow-framework/issues/537 which requires the same solution.
Additional issue found: the error
field looks like it is not cleared, even when deprovisioning and reprovisioning:
{
"workflow_id": "LGKGn44B51-soDq-mga1",
"error": "org.opensearch.flowframework.exception.WorkflowStepException during step create_ingest_pipeline, restStatus: BAD_REQUEST",
"state": "COMPLETED",
"provisioning_progress": "DONE",
"provision_start_time": 1712074192234,
"provision_end_time": 1712074192545,
"resources_created": [
{
"resource_type": "pipeline_id",
"resource_id": "test-pipeline",
"workflow_step_name": "create_ingest_pipeline",
"workflow_step_id": "create_ingest_pipeline"
},
{
"resource_type": "index_name",
"resource_id": "my-knn-index",
"workflow_step_name": "create_index",
"workflow_step_id": "create_index"
}
]
}
Additional issue found: the error field looks like it is not cleared, even when deprovisioning and reprovisioning:
If you review and approve #635 that problem will go away :-)
Additional issue found: the error field looks like it is not cleared, even when deprovisioning and reprovisioning:
If you review and approve #635 that problem will go away :-)
Ha! Perfect
on a similar note should we improve to add more details in the error message itself: "org.opensearch.flowframework.exception.WorkflowStepException during step create_ingest_pipeline, restStatus: BAD_REQUEST"
we use WorkflowStepException
to let us know the step itself is the reason for the error, but we don't propagate outside the logs the fact that the issue is that we don't have text_embedding processor available in the cluster
.
Today if a user hits the create ingest pipeline API directly they get more information without having to look at the logs:
{
"error": {
"root_cause": [
{
"type": "parse_exception",
"reason": "No processor type exists with name [text_embeddinssg]",
"processor_type": "text_embeddinssg"
}
],
"type": "parse_exception",
"reason": "No processor type exists with name [text_embeddinssg]",
"processor_type": "text_embeddinssg"
},
"status": 400
}
on a similar note should we improve to add more details in the error message itself:
"org.opensearch.flowframework.exception.WorkflowStepException during step create_ingest_pipeline, restStatus: BAD_REQUEST"
we useWorkflowStepException
to let us know the step itself is the reason for the error, but we don't propagate outside the logs the fact that the issue is that we don't havetext_embedding processor available in the cluster
.Today if a user hits the create ingest pipeline API directly they get more information without having to look at the logs:
{ "error": { "root_cause": [ { "type": "parse_exception", "reason": "No processor type exists with name [text_embeddinssg]", "processor_type": "text_embeddinssg" } ], "type": "parse_exception", "reason": "No processor type exists with name [text_embeddinssg]", "processor_type": "text_embeddinssg" }, "status": 400 }
@dbwiddis @ohltyler @owaiskazi19 @jackiehanyang @joshpalis Can i get some input on this, I feel like its a very big deal actually, since there are many other examples that if users used the current API they are missing on the exception that is given to the user. Another example if we register a local model in ML without changing settings we get this back:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "No eligible node found to execute this request. It's best practice to provision ML nodes to serve your models. You can disable this setting to serve the model on your data node for development purposes by disabling the \"plugins.ml_commons.only_run_on_ml_node\" configuration using the _cluster/setting api"
}
],
"type": "illegal_argument_exception",
"reason": "No eligible node found to execute this request. It's best practice to provision ML nodes to serve your models. You can disable this setting to serve the model on your data node for development purposes by disabling the \"plugins.ml_commons.only_run_on_ml_node\" configuration using the _cluster/setting api"
},
"status": 400
}
this is hidden on our error message.
on a similar note should we improve to add more details in the error message itself:
"org.opensearch.flowframework.exception.WorkflowStepException during step create_ingest_pipeline, restStatus: BAD_REQUEST"
we useWorkflowStepException
to let us know the step itself is the reason for the error, but we don't propagate outside the logs the fact that the issue is that we don't havetext_embedding processor available in the cluster
. Today if a user hits the create ingest pipeline API directly they get more information without having to look at the logs:{ "error": { "root_cause": [ { "type": "parse_exception", "reason": "No processor type exists with name [text_embeddinssg]", "processor_type": "text_embeddinssg" } ], "type": "parse_exception", "reason": "No processor type exists with name [text_embeddinssg]", "processor_type": "text_embeddinssg" }, "status": 400 }
@dbwiddis @ohltyler @owaiskazi19 @jackiehanyang @joshpalis Can i get some input on this, I feel like its a very big deal actually, since there are many other examples that if users used the current API they are missing on the exception that is given to the user. Another example if we register a local model in ML without changing settings we get this back:
{ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "No eligible node found to execute this request. It's best practice to provision ML nodes to serve your models. You can disable this setting to serve the model on your data node for development purposes by disabling the \"plugins.ml_commons.only_run_on_ml_node\" configuration using the _cluster/setting api" } ], "type": "illegal_argument_exception", "reason": "No eligible node found to execute this request. It's best practice to provision ML nodes to serve your models. You can disable this setting to serve the model on your data node for development purposes by disabling the \"plugins.ml_commons.only_run_on_ml_node\" configuration using the _cluster/setting api" }, "status": 400 }
this is hidden on our error message.
created new issue for this: https://github.com/opensearch-project/flow-framework/issues/670
This issue was completed in #642. New issue raised in reopen comment is being tracked in #670.
If a workflow has a runtime failure when provisioning (e.g.,
text_embedding
processor doesn't exist when runningcreate_ingest_pipeline
step on a cluster with noneural-search
plugin installed), the state is correctly set toFAILED
with a relevant error message. Example:However, when updating the cluster to address the runtime failure, and when re-running the provision API, it fails and returns an incorrect message:
This should be updated to a more relevant message, along the lines of
"The workflow failed provisioning, please deprovision and try again"
.Note that after manually running the deprovision API, and re-running provision, it works as expected. But feels weird to have to run deprovision when it's doing nothing besides reset the state, since there was no created resources to clean up to begin with.