opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
156 stars 123 forks source link

Add quantization state reader and writer #1997

Closed ryanbogan closed 2 months ago

ryanbogan commented 2 months ago

Description

Adds a quantization state writer and reader for native engines

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

ryanbogan commented 2 months ago

I still need to write unit tests, but raising now for initial feedback

ryanbogan commented 2 months ago

I'll add end to end integration tests to fully test the quantization reader and writer once the integration is complete with indexing and query flow. Additionally, I'll add another unit test for the second read method with the PR for integration to the query flow, since the method is currently incomplete.

ryanbogan commented 2 months ago

I see that writer is only added for flush and not for merge. Will there be separate PR for that?

I thought that the merge method eventually uses flush based on https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriter.java#L90

If that is not the case, I can add in separate PR

navneet1v commented 2 months ago

I see that writer is only added for flush and not for merge. Will there be separate PR for that?

I thought that the merge method eventually uses flush based on https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriter.java#L90

If that is not the case, I can add in separate PR

No it doesn't use same method. I would not recommend creating another PR. Please add it in 1 PR as this PR is very small and I would like to see both flush and merge for QS writer in 1 shot.

Vikasht34 commented 2 months ago

I see that writer is only added for flush and not for merge. Will there be separate PR for that?

I thought that the merge method eventually uses flush based on https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriter.java#L90 If that is not the case, I can add in separate PR

No it doesn't use same method. I would not recommend creating another PR. Please add it in 1 PR as this PR is very small and I would like to see both flush and merge for QS writer in 1 shot.

With My CR lots of things has been changed with Flush and Merge , We don't have to save Quantization State at two places , it will be in one Function , We should even hold on Flush also .

ryanbogan commented 2 months ago

With My CR lots of things has been changed with Flush and Merge , We don't have to save Quantization State at two places , it will be in one Function , We should even hold on Flush also .

I'll be putting the write state in your trainAndIndex method, but we'll still need to write the header and footer in the flush/merge methods.

ryanbogan commented 2 months ago

Still need a unit test for the second read method now that it's fully implemented.

ryanbogan commented 2 months ago

BWC failures are not due to this PR

ryanbogan commented 2 months ago

Adding skip-changelog because entry is in the release notes instead

navneet1v commented 2 months ago

Merging the PR to unblock others. The test which is failing will be fixed in next iteration

discussed with @jmazanec15 and @ryanbogan

opensearch-trigger-bot[bot] commented 2 months ago

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-1997-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a58a9dcc7b7c264693dbf25e2ad37b0807c2f724
# Push it to GitHub
git push --set-upstream origin backport/backport-1997-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-1997-to-2.x.

opensearch-trigger-bot[bot] commented 2 months ago

The backport to 2.17 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.17 2.17
# Navigate to the new working tree
cd .worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-1997-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a58a9dcc7b7c264693dbf25e2ad37b0807c2f724
# Push it to GitHub
git push --set-upstream origin backport/backport-1997-to-2.17
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.17

Then, create a pull request where the base branch is 2.17 and the compare/head branch is backport/backport-1997-to-2.17.