Closed navneet1v closed 1 month ago
Seems like the min distribution is still not updated with the lucene version. Hence the builds are failing. Will check on Jenkins.
Ref: https://build.ci.opensearch.org/view/all/job/distribution-build-opensearch/
Conversation happening with the build team on this thread: https://opensearch.slack.com/archives/C04UTNM338A/p1728341414662579
On further checking and talking to build team we found out that lucene upgrade was merged in opensearch in last 12hrs, which updated the maven repo(which happens with every merge in main branch) but the min distribution runs once in 24hrs hence min distribution is not updated.
Thanks @gaiksaya for helping here. A new build is triggered ref: https://build.ci.opensearch.org/blue/organizations/jenkins/publish-opensearch-min-snapshots/detail/publish-opensearch-min-snapshots/1818/pipeline/ Once it is completed I will re-run the GH actions.
On doing deep-dive on the failed ITs I found out that lucene has changed the way they were using the FlatFieldVectorWriter. This will require more changes in the code to ensure that tests are passing since the changes are in the indexing path. I am working on the fix for this. Will try to raise a PR by today.
Ref: https://github.com/apache/lucene/pull/13538
Earlier the Lucene99FlatVectorsWriter.FieldWriter we calling the KNNFieldWriter as a delegate. Now it is not calling anymore, hence we need to call Lucene99FlatVectorsWriter.FieldWriter.addValue from out NativeEngineFieldsVectorWriter.addValue
Overall NativeEngineWriter perspective it looks good for isCompress flag please rely on approval from other maintainers.
- We can look at if there is a way to leverage FlatVectorFieldsWriter to get the vectors maybe as a follow up for this. I know NativeEngineWriter uses Map and FlatVectorFieldsWriter uses Map but might be worth it if we can leverage at to reduce ram.
- I see use of any() in unit tests, those matchers don't make for tight tests. Try to verify with exact values if possible
the use of any is added in where we are completely mocking the NativeEngineFieldVectorsWriter flush and merge tests, for other places I have removed it already.
We can look at if there is a way to leverage FlatVectorFieldsWriter to get the vectors maybe as a follow up for this. I know NativeEngineWriter uses Map and FlatVectorFieldsWriter uses Map but might be worth it if we can leverage at to reduce ram
Yes, this is a good suggestion. I have it my mind but problem is if I do it right now the scope of the PR will be huge. Already it had items which came as a part of interface changes.
Adding backport label as core has backported the lucene upgrade PR to 2.x branch: https://github.com/opensearch-project/OpenSearch/pull/16211
- We can look at if there is a way to leverage FlatVectorFieldsWriter to get the vectors maybe as a follow up for this. I know NativeEngineWriter uses Map and FlatVectorFieldsWriter uses List but might be worth it if we can leverage to reduce ram usage similar to lucene.
Created a GH issue for the fix: https://github.com/opensearch-project/k-NN/issues/2207
Description
Fix lucene codec after lucene version bumped to 9.12 ~Currently the version bump has happened only for main branch hence we are not doing any backport here.~
2.x port done for core: https://github.com/opensearch-project/OpenSearch/pull/16211
This change includes:
Related Issues
Resolves https://github.com/opensearch-project/k-NN/issues/2193
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.