opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
152 stars 113 forks source link

Release Version 2.4.0 #576

Closed peterzhuamazon closed 1 year ago

peterzhuamazon commented 1 year ago

Release Version 2.4.0

This is a component issue for 2.4.0. Coming from opensearch-build#2649. Please follow the following checklist. Please refer to the DATES / CAMPAIGNS in that post.

How to use this issue

## This Component Release Issue This issue captures the state of the OpenSearch release, on component/plugin level; its assignee is responsible for driving the release. Please contact them or @mention them on this issue for help. Any release related work can be linked to this issue or added as comments to create visiblity into the release status. ## Release Steps There are several steps to the release process; these steps are completed as the whole component release and components that are behind present risk to the release. The component owner resolves the tasks in this issue and communicate with the overall release owner to make sure each component are moving along as expected. Steps have completion dates for coordinating efforts between the components of a release; components can start as soon as they are ready far in advance of a future release. The most current set of dates is on the overall release issue linked at the top of this issue. ## The Overall Release Issue Linked at the top of this issue, the overall release issue captures the state of the entire OpenSearch release including references to this issue, the release owner which is the assignee is responsible for communicating the release status broadly. Please contact them or @mention them on that issue for help. ## What should I do if my plugin isn't making any changes? If including changes in this release, increment the version on `2.0` branch to `2.4.0` for Min/Core, and `2.4.0.0` for components. Otherwise, keep the version number unchanged for both.

Preparation

CI/CD

Pre-Release

Release Testing

Release

Post Release

heemin32 commented 1 year ago
// Run cluster
export VERSION=2.4.0
export REPO=opensearchstaging
docker pull $REPO/opensearch:$VERSION \
&& docker run -it -p 9200:9200 \
-e "discovery.type=single-node" \
$REPO/opensearch:$VERSION

// Run integration test
heemin@b0f1d870326d k-NN % ./gradlew :integTestRemote \
-Dtests.rest.cluster=localhost:9200 \
-Dtests.cluster=localhost:9200 \
-Dtests.clustername="integTest-0" \
-Dhttps=true \
-Duser=admin \
-Dpassword=admin
Starting a Gradle Daemon (subsequent builds will be faster)
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 7.5
  OS Info               : Mac OS X 12.6.1 (aarch64)
  JDK Version           : 17 (Amazon Corretto JDK)
  JAVA_HOME             : /Users/heemin/Library/Java/JavaVirtualMachines/corretto-17.0.4.1/Contents/Home
  Random Testing Seed   : C28FBE122534FF6C
  In FIPS 140 mode      : false
=======================================

> Task :compileJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.

> Task :compileTestFixturesJava
Note: /Users/heemin/workplace/OS/k-NN/src/testFixtures/java/org/opensearch/knn/KNNRestTestCase.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :compileTestJava
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /Users/heemin/workplace/OS/k-NN/src/test/java/org/opensearch/knn/index/OpenSearchIT.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :integTestRemote
OpenJDK 64-Bit Server VM warning: Ignoring option --illegal-access=warn; support was removed in 17.0

BUILD SUCCESSFUL in 5m 59s
naveentatikonda commented 1 year ago

Testing on Windows

Sanity Testing with Security

Administrator@EC2AMAZ-06374IJ MINGW64 ~/Documents/windows-perf-testing/k-NN (2.4)
$ ./gradlew.bat :integTestRemote -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername="integTest-0" -Dhttps=true -Duser=admin -Dpassword=admin
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 7.5
  OS Info               : Windows Server 2016 10.0 (amd64)
  JDK Version           : 11 (Eclipse Temurin JDK)
  JAVA_HOME             : C:\Users\Administrator\scoop\apps\temurin11-jdk\current
  Random Testing Seed   : 1D3FEA71DF680A52
  In FIPS 140 mode      : false
=======================================

BUILD SUCCESSFUL in 14m 20s
13 actionable tasks: 4 executed, 9 up-to-date

Sanity Testing without Security

Administrator@EC2AMAZ-06374IJ MINGW64 ~/Documents/windows-perf-testing/k-NN (2.4)
$ ./gradlew.bat :integTestRemote -Dtests.rest.cluster=localhost:9200 -Dtests.cluster=localhost:9200 -Dtests.clustername="integTest-0" -Dhttps=false -PnumNodes=1
=======================================
OpenSearch Build Hamster says Hello!
  Gradle Version        : 7.5
  OS Info               : Windows Server 2016 10.0 (amd64)
  JDK Version           : 11 (Eclipse Temurin JDK)
  JAVA_HOME             : C:\Users\Administrator\scoop\apps\temurin11-jdk\current
  Random Testing Seed   : AD3D01272A870D33
  In FIPS 140 mode      : false
=======================================

BUILD SUCCESSFUL in 12m 3s
13 actionable tasks: 4 executed, 9 up-to-date

Performance Test Results Triggered the performance tests by manually spinning up a cluster on Windows(single cluster without any autoscaling). So, we are seeing the query latencies, ingestion time and refresh intervals are more when compared to the routine way of testing. But, the recall values are almost the same.

Data set: sift-128-euclidean.hdf5

NMSLIB HNSW

{
 "metadata": {
 "test_name": "index-workflow",
 "test_id": "Index workflow",
 "date": "11/11/2022 17:40:41",
 "python_version": "3.7.10 (default, Jun 3 2021, 00:02:01) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)]",
 "os_version": "Linux-5.10.144-127.601.amzn2.x86_64-x86_64-with-glibc2.2.5",
 "processor": "x86_64, 36 cores",
 "memory": "409739264 (used) / 72543625216 (available) / 73638051840 (total)"
 },
 "results": {
 "test_took": 408957.2112394183,
 "delete_index_took_total": 462.122771400027,
 "create_index_took_total": 1475.9138787048869,
 "ingest_took_total": 77012.53456410486,
 "refresh_index_store_kb_total": 1493502.6111328125,
 "refresh_index_took_total": 148082.94002520852,
 "query_took_total": 181923.7,
 "query_took_p50": 18.3,
 "query_took_p90": 20.9,
 "query_took_p99": 22.9,
 "query_memory_kb_total": 652889.8,
 "query_recall@K_total": 0.9998790000000002,
 "query_recall@1_total": 1.0
 }
}

FAISS IVF

{
 "metadata": {
 "test_name": "index-workflow",
 "test_id": "index workflow",
 "date": "11/11/2022 07:03:04",
 "python_version": "3.7.10 (default, Jun 3 2021, 00:02:01) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)]",
 "os_version": "Linux-5.10.144-127.601.amzn2.x86_64-x86_64-with-glibc2.2.5",
 "processor": "x86_64, 36 cores",
 "memory": "408100864 (used) / 72545366016 (available) / 73638051840 (total)"
 },
 "results": {
 "test_took": 652188.0653867477,
 "delete_model_took_total": 341.6857195843477,
 "delete_index_took_total": 685.7768759888131,
 "train_model_took_total": 2536.0742158896755,
 "create_index_took_total": 1688.5965134948492,
 "ingest_took_total": 75003.37028029608,
 "refresh_index_store_kb_total": 1375142.[7145507813](tel:7145507813),
 "refresh_index_took_total": 1549.061781493947,
 "query_took_total": 570383.5,
 "query_took_p50": 56.9,
 "query_took_p90": 57.5,
 "query_took_p99": 66.9,
 "query_memory_kb_total": 540157.7,
 "query_recall@K_total": 0.947138,
 "query_recall@1_total": 1.0
 }
}

FAISS IVFPQ

{
 "metadata": {
 "test_name": "index-workflow",
 "test_id": "index workflow",
 "date": "11/11/2022 10:28:16",
 "python_version": "3.7.10 (default, Jun 3 2021, 00:02:01) \n[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)]",
 "os_version": "Linux-5.10.144-127.601.amzn2.x86_64-x86_64-with-glibc2.2.5",
 "processor": "x86_64, 36 cores",
 "memory": "402718720 (used) / 72550666240 (available) / 73638051840 (total)"
 },
 "results": {
 "test_took": 866206.681305513,
 "delete_model_took_total": 319.0529257117305,
 "delete_index_took_total": 402.75771630113013,
 "train_model_took_total": 462223.1749842001,
 "create_index_took_total": 2207.049490499776,
 "ingest_took_total": 74747.44621790014,
 "refresh_index_store_kb_total": 893477.081640625,
 "refresh_index_took_total": 5310.499970900128,
 "query_took_total": 320996.7,
 "query_took_p50": 31.7,
 "query_took_p90": 32.5,
 "query_took_p99": 37.7,
 "query_memory_kb_total": 63611.0,
 "query_recall@K_total": 0.6136619999999999,
 "query_recall@1_total": 0.9944
 }
}
heemin32 commented 1 year ago

Perf test result for linux

Screen Shot 2022-11-11 at 4 37 40 PM

Legend

ingest_took_total: total time it took to ingest all test data in seconds query_took_p90: p90 query time in milliseconds query_took_p99: p99 query time in milliseconds

Configuration

 * Client
   * OS: Amazon Linux
   * Instance Type: c5.9xlarge
   * Count: 1
   * EBS Size: 800G
   * AZ: us-east-1
 * Leader node
   * OS: Amazon Linux
   * Instance Type: c5.xlarge
   * Count: 3
   * EBS Size: 20G
   * AZ: us-east-1
 * Data node
   * OS: Amazon Linux
   * Instance Type: r5.4xlarge
   * Count: 3
   * EBS Size: 500G
   * AZ: us-east-1