opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.
https://opendistro.github.io/
Apache License 2.0
277 stars 55 forks source link

Fatal error during k-NN search #220

Closed williamdias closed 4 years ago

williamdias commented 4 years ago
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x00007fc8997e6f89, pid=1, tid=343
#
# JRE version: OpenJDK Runtime Environment AdoptOpenJDK (14.0.1+7) (build 14.0.1+7)
# Java VM: OpenJDK 64-Bit Server VM AdoptOpenJDK (14.0.1+7, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libKNNIndexV1_7_3_6.so+0x1a1f89]  float std::generate_canonical<float, 24ul, std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul> >(std::mersenne_twister_engine<unsigned long, 32ul, 624ul, 397ul, 31ul, 2567483615ul, 11ul, 4294967295ul, 7ul, 2636928640ul, 15ul, 4022730752ul, 18ul, 1812433253ul>&)+0x1c9
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# logs/hs_err_pid1.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/AdoptOpenJDK/openjdk-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Just ran docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" amazon/opendistro-for-elasticsearch:1.9.0

Seems to be related to #95 and https://discuss.opendistrocommunity.dev/t/es-crashes-when-indexing-knn-vectors/3682

"settings": {
    "index.knn": True,
    "knn.space_type": "cosinesimil",
    "number_of_shards": 1,
    "number_of_replicas": 0
},
"mappings": {
    "properties": {
        "pid": {
            "type": "keyword",
            "index": True
        },
        "vec": {
            "type": "knn_vector",
            "dimension": 256
        }
    }
}
jmazanec15 commented 4 years ago

Hi @williamdias

Yes, I believe it is related to that issue.

When building the docker image, we build the library natively to the machine the docker image is being built on. That machine has support for optimized instructions. I am guessing your machine does not have that support, causing the library to crash.

In this PR we disable optimizations during library build for docker image.

However, as a work around, it is probably best to build the library from source so that it is compatible with your machine. Those steps would be as follows:

docker pull centos:7

docker run -it centos:7  /bin/bash -c "yum install git cmake gcc-c++ make -y; curl -O https://download.java.net/java/GA/jdk14/076bab302c7b4508975440c56f6cc26a/36/GPL/openjdk-14_linux-x64_bin.tar.gz; tar xvf openjdk-14_linux-x64_bin.tar.gz; export JAVA_HOME=/jdk-14; git clone https://github.com/opendistro-for-elasticsearch/k-NN.git; cd k-NN; git checkout v1.9.0.0; git submodule update --init -- jni/external/nmslib; cd jni; cmake .; make;"

docker cp <containerID>:/k-NN/jni/release/libKNNIndexV1_7_3_6.so `pwd`

docker run -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -v `pwd`/libKNNIndexV1_7_3_6.so:/usr/lib/libKNNIndexV1_7_3_6.so amazon/opendistro-for-elasticsearch:1.9.0 

Could you try that and let us know if it works for you?

Jack

williamdias commented 4 years ago

Hi @jmazanec15

The proposed solution worked.

Thank you!