Remove FileWatcher from KNN.

0ctopus13prime commented 1 month ago

Description

Design doc : https://github.com/opensearch-project/k-NN/issues/1885#issuecomment-2377956971

This PR is to remove FileWatcher from KNN. FileWatcher periodically monitors a vector index file then clean up resource once it is deleted. However, as outlined in the doc above, releasing allocated native memory is well being dealt with in 80DV Producer already. Therefore, we can safely remove FileWatcher. Just that, in since the newest version is now using vector format, I've added the same cleaning logic in there as well.

Related Issues

Move away from file watcher for releasing memory : https://github.com/opensearch-project/k-NN/issues/1885
[BUG] KNN doesn't release memory when close index : https://github.com/opensearch-project/k-NN/issues/1012

Check List

[ ] New functionality includes testing.
[ ] New functionality has been documented.
[ ] API changes companion pull request created.
[ ] Commits are signed per the DCO using --signoff.
[ ] Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

0ctopus13prime commented 1 month ago

@jmazanec15 Please note that this is the initial PR for code sharing. Will add integ testings + unit tests shortly.

0ctopus13prime commented 1 month ago

Memory usage monitoring

During the benchmark, the memory usage stayed pretty much the same as the baseline.

Candidate

Baseline

kill-file-watcher-baseline

0ctopus13prime commented 1 month ago

Do verify the latencies as well since you are doing the benchmarking

yeah, that's the plan. I've stuck in fixing subtle issues, and I think it's good to go benchmarking. I'm not expecting there are sever performance regression, but will post the gap in this issue.

0ctopus13prime commented 1 month ago

Please note that in NativeIndexWriter, we still have FSDirectory dependency. Which will be dealt with in the next PR as a part of introducing writing layer in native engines.

0ctopus13prime commented 1 month ago

Found few testing cases are not updated according to this change. Will fix the testing cases and re-raise PR.

0ctopus13prime commented 1 month ago

Performance Benchmark Comparison

Conclusion : I could not find an implication that having this change would degrade performance.

Metric | Task | Baseline-Value | Candidate-Value | (Candidate - Baseline) / Baseline | Unit -- | -- | -- | -- | -- | -- Cumulative indexing time of primary shards | | 30.4969 | 30.3151 | -0.005961262 | min Min cumulative indexing time across primary shards | | 0.00015 | 0.000133333 | -0.111113333 | min Median cumulative indexing time across primary shards | | 15.2484 | 15.1575 | -0.005961281 | min Max cumulative indexing time across primary shards | | 30.4967 | 30.315 | -0.005958022 | min Cumulative indexing throttle time of primary shards | | 0 | 0 | 0 | min Min cumulative indexing throttle time across primary shards | | 0 | 0 | 0 | min Median cumulative indexing throttle time across primary shards | | 0 | 0 | 0 | min Max cumulative indexing throttle time across primary shards | | 0 | 0 | 0 | min Cumulative merge time of primary shards | | 175.918 | 172.859 | -0.017388783 | min Cumulative merge count of primary shards | | 117 | 116 | -0.008547009 | Min cumulative merge time across primary shards | | 0 | 0 | 0 | min Median cumulative merge time across primary shards | | 87.9589 | 86.4296 | -0.017386529 | min Max cumulative merge time across primary shards | | 175.918 | 172.859 | -0.017388783 | min Cumulative merge throttle time of primary shards | | 0.8007 | 0.89955 | 0.123454477 | min Min cumulative merge throttle time across primary shards | | 0 | 0 | 0 | min Median cumulative merge throttle time across primary shards | | 0.40035 | 0.449775 | 0.123454477 | min Max cumulative merge throttle time across primary shards | | 0.8007 | 0.89955 | 0.123454477 | min Cumulative refresh time of primary shards | | 1.2548 | 1.44825 | 0.154167995 | min Cumulative refresh count of primary shards | | 90 | 88 | -0.022222222 | Min cumulative refresh time across primary shards | | 0.0003 | 0.0003 | 0 | min Median cumulative refresh time across primary shards | | 0.6274 | 0.724125 | 0.154167995 | min Max cumulative refresh time across primary shards | | 1.2545 | 1.44795 | 0.154204862 | min Cumulative flush time of primary shards | | 12.2934 | 11.8517 | -0.035929849 | min Cumulative flush count of primary shards | | 58 | 58 | 0 | Min cumulative flush time across primary shards | | 0 | 0 | 0 | min Median cumulative flush time across primary shards | | 6.14672 | 5.92586 | -0.035931359 | min Max cumulative flush time across primary shards | | 12.2934 | 11.8517 | -0.035929849 | min Total Young Gen GC time | | 0.332 | 0.413 | 0.243975904 | s Total Young Gen GC count | | 18 | 18 | 0 | Total Old Gen GC time | | 0 | 0 | 0 | s Total Old Gen GC count | | 0 | 0 | 0 | Store size | | 29.8184 | 29.8183 | -3.35363E-06 | GB Translog size | | 5.82E-07 | 5.82E-07 | 0 | GB Heap used for segments | | 0 | 0 | 0 | MB Heap used for doc values | | 0 | 0 | 0 | MB Heap used for terms | | 0 | 0 | 0 | MB Heap used for norms | | 0 | 0 | 0 | MB Heap used for points | | 0 | 0 | 0 | MB Heap used for stored fields | | 0 | 0 | 0 | MB Segment count | | 2 | 2 | 0 | Min Throughput | custom-vector-bulk | 7290.25 | 3034.92 | -0.583701519 | docs/s Mean Throughput | custom-vector-bulk | 11204.8 | 11096 | -0.009710124 | docs/s Median Throughput | custom-vector-bulk | 10741.7 | 10421.3 | -0.029827681 | docs/s Max Throughput | custom-vector-bulk | 21616.1 | 20570.6 | -0.048366727 | docs/s 50th percentile latency | custom-vector-bulk | 80.3333 | 80.4247 | 0.00113776 | ms 90th percentile latency | custom-vector-bulk | 166.626 | 164.697 | -0.011576825 | ms 99th percentile latency | custom-vector-bulk | 315.914 | 319.73 | 0.012079237 | ms 99.9th percentile latency | custom-vector-bulk | 1505.69 | 1567.09 | 0.040778646 | ms 99.99th percentile latency | custom-vector-bulk | 4159.22 | 4244.81 | 0.020578378 | ms 100th percentile latency | custom-vector-bulk | 6280.65 | 7164.91 | 0.14079116 | ms 50th percentile service time | custom-vector-bulk | 80.3333 | 80.4247 | 0.00113776 | ms 90th percentile service time | custom-vector-bulk | 166.626 | 164.697 | -0.011576825 | ms 99th percentile service time | custom-vector-bulk | 315.914 | 319.73 | 0.012079237 | ms 99.9th percentile service time | custom-vector-bulk | 1505.69 | 1567.09 | 0.040778646 | ms 99.99th percentile service time | custom-vector-bulk | 4159.22 | 4244.81 | 0.020578378 | ms 100th percentile service time | custom-vector-bulk | 6280.65 | 7164.91 | 0.14079116 | ms error rate | custom-vector-bulk | 0 | 0 | 0 | % Min Throughput | force-merge-segments | 0 | 0 | 0 | ops/s Mean Throughput | force-merge-segments | 0 | 0 | 0 | ops/s Median Throughput | force-merge-segments | 0 | 0 | 0 | ops/s Max Throughput | force-merge-segments | 0 | 0 | 0 | ops/s 100th percentile latency | force-merge-segments | 6.17E+06 | 6.10E+06 | -0.011397476 | ms 100th percentile service time | force-merge-segments | 6.17E+06 | 6.10E+06 | -0.011397476 | ms error rate | force-merge-segments | 0 | 0 | 0 | % Min Throughput | warmup-indices | 0.23 | 0.23 | 0 | ops/s Mean Throughput | warmup-indices | 0.23 | 0.23 | 0 | ops/s Median Throughput | warmup-indices | 0.23 | 0.23 | 0 | ops/s Max Throughput | warmup-indices | 0.23 | 0.23 | 0 | ops/s 100th percentile latency | warmup-indices | 4423.45 | 4382.95 | -0.009155749 | ms 100th percentile service time | warmup-indices | 4423.45 | 4382.95 | -0.009155749 | ms error rate | warmup-indices | 0 | 0 | 0 | % Min Throughput | prod-queries | 0.67 | 0.67 | 0 | ops/s Mean Throughput | prod-queries | 0.67 | 1 | 0.492537313 | ops/s Median Throughput | prod-queries | 0.67 | 0.67 | 0 | ops/s Max Throughput | prod-queries | 0.67 | 1.66 | 1.47761194 | ops/s 50th percentile latency | prod-queries | 8.23135 | 8.26458 | 0.004037005 | ms 90th percentile latency | prod-queries | 9.44003 | 9.03913 | -0.042468085 | ms 99th percentile latency | prod-queries | 26.661 | 26.0028 | -0.024687746 | ms 100th percentile latency | prod-queries | 1499.85 | 1482.8 | -0.011367803 | ms 50th percentile service time | prod-queries | 8.23135 | 8.26458 | 0.004037005 | ms 90th percentile service time | prod-queries | 9.44003 | 9.03913 | -0.042468085 | ms 99th percentile service time | prod-queries | 26.661 | 26.0028 | -0.024687746 | ms 100th percentile service time | prod-queries | 1499.85 | 1482.8 | -0.011367803 | ms error rate | prod-queries | 0 | 0 | 0 | % Mean recall@k | prod-queries | 0.3 | 0.29 | -0.033333333 | Mean recall@1 | prod-queries | 0.37 | 0.28 | -0.243243243 |

opensearch-trigger-bot[bot] commented 1 month ago

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-2182-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e5599aa151cf43700d0ea327ad536cc106e5bb61
# Push it to GitHub
git push --set-upstream origin backport/backport-2182-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2182-to-2.x.

0ctopus13prime commented 1 month ago

Looks like auto backporting is failing. Will raise a backport PR shortly.

0ctopus13prime commented 1 month ago

Backport PR : https://github.com/opensearch-project/k-NN/pull/2225

opensearch-project / k-NN