opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
83 stars 118 forks source link

[BUG] mapper_parsing_exception: failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN' #2298

Open lihuimingxs opened 2 months ago

lihuimingxs commented 2 months ago

What is the bug? A clear and concise description of the bug.

In Opensearch 2.12.0:

By using the Bulk operation on the Java client and IndexOperation to create or update documents, Preview of field's value: 'NaN' exceptions will be encountered when calculating vectors using GPU nodes, and errors will still occur in single threads. However, writing the erroneous data again can be done normally.

And when the cluster uses CPU to calculate vectors, this problem will be solved. Therefore, I guess the reason for the error is that the GPU calculation vector is unstable, but I cannot confirm this.

Here is my Java code and detailed exception information:

Java Code:

private void sendOpenSearch(List<EntityDoc> docList) {
    try {
        List<BulkOperation> operationList = new ArrayList<>(docList.size());
        for(EntityDoc doc : docList){
            BulkOperation operation = new BulkOperation.Builder()
                    .index(new IndexOperation.Builder<>()
                            .index(opensearchProperty.getRefreshIndex())
                            .id(doc.getId())
                            .document(doc)
                            .build())
                    .build();
            operationList.add(operation);
        }

        BulkRequest bulkRequest = new BulkRequest.Builder()
                .index(opensearchProperty.getRefreshIndex())
                .operations(operationList)
                .build();

        BulkResponse response = openSearchClient.bulk(bulkRequest);
        if(response.errors()){
            response.items().forEach( item ->{
                if(null != item.error() && null != item.error().causedBy()){
                    log.error("Exception reason:{}",item.id(),item.error().causedBy().reason());
                }
            });
        }
    } catch (IOException e) {
        log.error("OpenSearch IO Exception",e);
    }
}

Exception:

2024-04-01 09:31:51,454 [org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#3-2] [] ERROR c.c.c.t.a.c.OpenSearchTalentConsumer - OpenSearch保存数据失败
org.opensearch.client.opensearch._types.OpenSearchException: Request failed: [mapper_parsing_exception] failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN'
        at org.opensearch.client.transport.rest_client.RestClientTransport.getHighLevelResponse(RestClientTransport.java:270)
        at org.opensearch.client.transport.rest_client.RestClientTransport.performRequest(RestClientTransport.java:143)
        at org.opensearch.client.opensearch.OpenSearchClient.update(OpenSearchClient.java:1578)
        at com.ci.application.consumer.OpenSearchTalentConsumer.reSendOpensearch(OpenSearchTalentConsumer.java:97)
        at com.ci.application.consumer.OpenSearchTalentConsumer.sendOpensearch(OpenSearchTalentConsumer.java:85)
        at com.ci.application.consumer.OpenSearchTalentConsumer.consume(OpenSearchTalentConsumer.java:55)
        at jdk.internal.reflect.GeneratedMethodAccessor1421.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:171)
        at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:120)
        at org.springframework.amqp.rabbit.listener.adapter.HandlerAdapter.invoke(HandlerAdapter.java:49)
        at org.springframework.amqp.rabbit.listener.adapter.MessagingMessageListenerAdapter.invokeHandler(MessagingMessageListenerAdapter.java:190)
        at org.springframework.amqp.rabbit.listener.adapter.MessagingMessageListenerAdapter.onMessage(MessagingMessageListenerAdapter.java:127)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:1552)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.actualInvokeListener(AbstractMessageListenerContainer.java:1478)
        at jdk.internal.reflect.GeneratedMethodAccessor949.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
        at org.springframework.retry.interceptor.RetryOperationsInterceptor$1.doWithRetry(RetryOperationsInterceptor.java:91)
        at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:287)
        at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:180)
        at org.springframework.retry.interceptor.RetryOperationsInterceptor.invoke(RetryOperationsInterceptor.java:115)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:212)
        at org.springframework.amqp.rabbit.listener.$Proxy354.invokeListener(Unknown Source)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:1466)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:1461)
        at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.executeListener(AbstractMessageListenerContainer.java:1410)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.doReceiveAndExecute(SimpleMessageListenerContainer.java:870)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.receiveAndExecute(SimpleMessageListenerContainer.java:854)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer.access$1600(SimpleMessageListenerContainer.java:78)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.mainLoop(SimpleMessageListenerContainer.java:1137)
        at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1043)
        at java.base/java.lang.Thread.run(Thread.java:829)

How can one reproduce the bug? Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What is the expected behavior? A clear and concise description of what you expected to happen.

What is your host/environment?

Do you have any screenshots? If applicable, add screenshots to help explain your problem.

Do you have any additional context? Add any other context about the problem.

dhrubo-os commented 2 months ago

Are you using any ml-commons feature to generate this embedding? Can you give more details how to reproduce this issue?

If you aren't using any models through ml-commons, may be we can move this issue to K-NN plugin?

lihuimingxs commented 2 months ago

Are you using any ml-commons feature to generate this embedding? Can you give more details how to reproduce this issue?

If you aren't using any models through ml-commons, may be we can move this issue to K-NN plugin?

I used my custom model.

May I ask what other information do I need to provide?

ylwu-amzn commented 2 months ago

failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN'

From the error , you are going to save 'NaN' to knn_vector field ?

lihuimingxs commented 2 months ago

failed to parse field [embeddingVector] of type [knn_vector] in document with id 'xxx'. Preview of field's value: 'NaN'

From the error , you are going to save 'NaN' to knn_vector field ?

No, my embeddingContent actually contains data, not NaN. However, the value I obtained was NaN, which led to an error in vector calculation. Yet, without any modifications, after increasing the number of client retries, this data can be saved normally.

dblock commented 1 week ago

Catch All Triage - 1 2 3 4 5 6