Open dreamer-89 opened 1 year ago
@dreamer-89 can you provide some details like what is the expected solution?
@dreamer-89 can you provide some details like what is the expected solution?
Thanks @navneet1v for checking on this. The issue description wasn't correct (actually copied from core issue). I updated the issue to make to add tests which is actionable on this repo.
Opened a PR to add basic integ test: https://github.com/opensearch-project/k-NN/pull/927
I see by default, tests use one node cluster which is not useful for segment replication. Thus, need some opinion from repo owners around structuring this into a separate test class which tied to a different gradle task or identify if newly added test can be run in multi-node setup. Tagging @navneet1v @naveentatikonda for visibility.
Recently, knn plugin integration was found broken for 2.7.0 version (issue on core https://github.com/opensearch-project/OpenSearch/issues/7781). This is due to bugs on core where 1) codec names were used before allowing replica to copy files from primary. 2) using default codec name on replica which is current version of Lucene codec e.g. Lucene95 today
Due to 2) above, the replication events for kNN indices were blocked from primary because it's lucene codec name (KNNXXXCodec) differs from replica (LuceneXXX). The end result was replica shard remained unassigned forever.
As part of this issue, we need to add tests so that these issues be caught before release.
Background on core issue
Segment replicaion event fails for plugins using custom codecs (e.g. kNN). The failure prevents replica shard from allocation. The end result is replica shard remains unassigned and will remain forever.
During peer recovery, for segment replication enabled indices a force segment sync is performed to keep the shard upto date from primary. Recently, we added a fix where to prevent segment replication events b/w primary and replica when they are using a different codec implementations. This is problematic as we fetched replica shard default codec rather than the one on the engine config.