Closed ylwu-amzn closed 1 month ago
Another issue, ingest_fields
can't work
"field_map": {
"_id": "$.recordId",
"embedding": "$.modelOutput.embedding"
},
"ingest_fields": ["$.modelInput.inputText"],
This way can work
"field_map": {
"_id": "$.recordId",
"embedding": "$.modelOutput.embedding",
"input": "$.modelInput.inputText"
},
Yes it's still using the TRAIN thread pool. The initial code doesn't use this dedicated Train thread so the exceptions are caught in the main thread and ML Tasks are updated to "Failed". After I added this "TRAIN" thread, the exceptions handle in the Train thread so they are not caught in the main anymore. I forgot to move the catch exceptions from the Main to "TRAIN". After the load tests, I will create a new thread pool just for Ingestion.
Bedrock batch inference job returns jobArn
like this
{
"jobArn": "arn:aws:bedrock:us-east-1:<account_id>:model-invocation-job/cszce2bsex07"
}
But the code currently only parse TransformJobArn
and id
. https://github.com/opensearch-project/ml-commons/blob/main/plugin/src/main/java/org/opensearch/ml/task/MLPredictTaskRunner.java#L367 , please enhance this part to make the parsing more general.
Suggest change this line https://github.com/opensearch-project/ml-commons/blob/main/plugin/src/main/java/org/opensearch/ml/task/MLPredictTaskRunner.java#L367C44-L367C52
if (dataAsMap != null
&& (dataAsMap.containsKey("TransformJobArn") || dataAsMap.containsKey("id"))) {
to
Integer statusCode = tensorOutput.getMlModelOutputs().get(0).getStatusCode();
if (dataAsMap != null
&& statusCode != null && statusCode >= 200 && statusCode < 300) {
?
Test with OS2.17 RC4
sample data of my_batch2.jsonl.out
It returns task id
xHk64pEBG9EkCQDLzc-I
But this task stays on
CREATED
forever. Checked log , error happensRemove
source[0].
fromembedding
field map can workSuggestion:
source[0]
prefix even we have one source fileopensearch_ml_train
? Can you confirm if we have dedicated thread pool ?