opendistro-for-elasticsearch / k-NN

🆕 A machine learning plugin which supports an approximate k-NN search algorithm for Open Distro.
https://opendistro.github.io/
Apache License 2.0
277 stars 55 forks source link

Fix PostingsFormat in KNN Codec #236

Closed vamshin closed 4 years ago

vamshin commented 4 years ago

Issue #, if available: https://github.com/opendistro-for-elasticsearch/k-NN/issues/227

Description of changes: Elasticsearch initiates PostingsFormat based on the MapperService. For KNNCodec, we do not pass MapperService and the PostingsFormat deviates from underlying format for the current Lucene version. Because of this some of the terms that use PostingsFormat could have incompatibility. For example https://github.com/opendistro-for-elasticsearch/k-NN/issues/227. Updated KNNCodec to rely on the right PostingsFormat used by underlying Codec of Elasticsearch.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov[bot] commented 4 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@76bdfaa). Click here to learn what that means. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #236   +/-   ##
=========================================
  Coverage          ?   78.77%           
  Complexity        ?      339           
=========================================
  Files             ?       53           
  Lines             ?     1343           
  Branches          ?      122           
=========================================
  Hits              ?     1058           
  Misses            ?      232           
  Partials          ?       53           
Impacted Files Coverage Δ Complexity Δ
...csearch/knn/index/codec/KNN86Codec/KNN86Codec.java 96.00% <100.00%> (ø) 15.00 <3.00> (?)
...roforelasticsearch/knn/plugin/KNNCodecService.java 75.00% <100.00%> (ø) 3.00 <1.00> (?)
...oforelasticsearch/knn/plugin/KNNEngineFactory.java 100.00% <100.00%> (ø) 3.00 <2.00> (?)
...elasticsearch/knn/plugin/stats/KNNStatsConfig.java 95.65% <0.00%> (ø) 1.00% <0.00%> (?%)
...search/knn/plugin/script/KNNVectorScoreScript.java 77.33% <0.00%> (ø) 6.00% <0.00%> (?%)
...forelasticsearch/knn/index/util/NmsLibVersion.java 100.00% <0.00%> (ø) 2.00% <0.00%> (?%)
...oforelasticsearch/knn/index/KNNCircuitBreaker.java 58.97% <0.00%> (ø) 6.00% <0.00%> (?%)
...forelasticsearch/knn/index/codec/KNNCodecUtil.java 83.33% <0.00%> (ø) 2.00% <0.00%> (?%)
...roforelasticsearch/knn/plugin/stats/StatNames.java 85.18% <0.00%> (ø) 3.00% <0.00%> (?%)
...csearch/knn/index/codec/KNN84Codec/KNN84Codec.java 30.00% <0.00%> (ø) 2.00% <0.00%> (?%)
... and 46 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 76bdfaa...8bd8382. Read the comment docs.

vamshin commented 4 years ago

Could we add test case confirming the fix works? Other than the suggests field, what other functionality does this bug break?

"Elasticsearch initiates PostingsFormat based on the MapperService. For KNNCodec, we do not pass MapperService"

Do you have code link for where Elasticsearch initiates PostingsFormat based on MapperService? Also, why can't we pass MapperService to KNNCodec?

  1. Manually verified the fix. I will add test case part of this PR https://github.com/opendistro-for-elasticsearch/k-NN/issues/245

  2. Sure. This code is part of CodecService. Please refer this link https://github.com/elastic/elasticsearch/blob/v7.9.1/server/src/main/java/org/elasticsearch/index/codec/CodecService.java#L49-L56