[FEATURE] Separate refresh interval for KNN segment merge

Dharin-shah commented 7 months ago

Is your feature request related to a problem? Yes, One of our use case is we have a big document json in an index, that has different text and other fields for filtering and aggregation, opensearch is our source of truth for the data we support via API. We also have a vector field in the same index, and we index the whole document everytime there is a change in some fields in the document, which essentially causes a segment creation after a refresh and eventually a segment merge. I have seen quite a few hot threads that take 100% of CPU during lucene segment merge and also during reindexing to rebuild the hnsw graph. My proposal is we dont need to refresh the graph as often as we refresh our json that has other fields.

Why we have it in the same index is another question, but primarily we use a hybrid search with knn and bm25 and also do some pre & post-filtering and aggregations on the other json fields we have. Hence its faster since we dont have to do multiple requests to achieve the same result.

  "dynamic_templates": [
    {
      "knn_vector_fields_500": {
        "match_mapping_type": "*",
        "match_pattern": "regex",
        "match": "tdp_500.*",
        "mapping": {
          "type": "knn_vector",
          "dimension": 500,
          "method": {
            "name": "hnsw",
            "space_type": "innerproduct",
            "engine": "faiss",
            "parameters": {
              "ef_construction": 200,
              "m": 35
            }
          },
          "index_options": {
            "type": "int8_hnsw"
          }
        }
      }
    }
  ]

100.7% (503.4ms out of 500ms) cpu usage by thread 'opensearch[7835d6139be7922149a306404451262b][[gyg_activity_option_no_6][0]: Lucene Merge Thread #7659]'
     10/10 snapshots sharing following 21 elements
       org.opensearch.knn.jni.FaissService.createIndex(Native Method)
       org.opensearch.knn.jni.JNIService.createIndex(JNIService.java:41)
       org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.lambda$createKNNIndexFromScratch$4(KNN80DocValuesConsumer.java:190)
       org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer$$Lambda$9857/0x000000010240fa80.run(Unknown Source)
       java.base@17.0.6/java.security.AccessController.executePrivileged(AccessController.java:776)
       java.base@17.0.6/java.security.AccessController.doPrivileged(AccessController.java:318)
       org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.createKNNIndexFromScratch(KNN80DocValuesConsumer.java:189)
       org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.lambda$addKNNBinaryField$2(KNN80DocValuesConsumer.java:134)
       org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer$$Lambda$9856/0x000000010240f630.createIndex(Unknown Source)
       org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.addKNNBinaryField(KNN80DocValuesConsumer.java:141)
       org.opensearch.knn.index.codec.KNN80Codec.KNN80DocValuesConsumer.merge(KNN80DocValuesConsumer.java:209)
       app//org.apache.lucene.index.SegmentMerger.mergeDocValues(SegmentMerger.java:178)
       app//org.apache.lucene.index.SegmentMerger$$Lambda$6908/0x0000000101c87d20.merge(Unknown Source)
       app//org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
       app//org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
       app//org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5158)
       app//org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4698)
       app//org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6450)
       app//org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
       app//org.opensearch.index.engine.OpenSearchConcurrentMergeScheduler.doMerge(OpenSearchConcurrentMergeScheduler.java:120)
       app//org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)

What solution would you like? Can we have a separate refresh interval for knn, which by default works as it is now, but can be configured to be refreshed periodically based on that parameter. Since reindexing the graph is essentially generating and rebuild the vertices for that node. Since we have a write heavy use case, we want to reduce the chances of that happening, and we are okay with less accurate "semantic" search for a period.

What alternatives have you considered? We have built a delta layer during writes to not write if there is no change in any fields, but considering we have a lot of fields, even a small change in one of the fields, causes it to reindex and essentially reindex the node in the graph, even if the vector hasnt changed.

Do you have any additional context? Added the logs

Dharin-shah commented 7 months ago

Any support & direction here would be appreciated, I can pick up the change post discussion :)

jmazanec15 commented 7 months ago

Hey @Dharin-shah, sorry for delay. I think overall something can be done, but what to do may be a little bit tricky and will require some brainstorming. I think separating the refresh interval for vector fields in the same index will be pretty tricky. I havent looked into it, but decoupling will not be straightforward based on field.

I wonder though, if something like keeping multiple graphs (based on size) in a single segment might be a way to solve the use case - where writes are heavy. For instance, instead of building one graph on merge, we could say X graphs from previous segments and search them all, while updating deletes. I dont believe this is supported in Lucene, but it is something we could potentially discuss.

The other potential idea is having a specialized read only index - in other words, build one big graph.

@navneet1v might have some ideas as well.

Dharin-shah commented 7 months ago

thanks @jmazanec15 . Maybe something around the DocValueConsumer

I am not well versed with Lucene's API, so i can have a look, for simpler yet good solution

navneet1v commented 7 months ago

@Dharin-shah right now as Jack is mentioning there is no solution provided by downstream system (Lucene) to just create data structures like graphs, inverted index etc for some fields.

Now one thing that we need to understand here is Lucene segments are immutable so even if we delay the graph creation but if graphs are not created by the time segments are written on the disk this will lead to more catastrophic issues.

Some suggestions from my side:

I would probably move towards a smaller refresh interval if you are using higher refresh interval with periodic force merge. This will ensure that a lot of documents containing the graphs are not accumulated.
Another solution can be move to Lucene engine. With 2.13 release of OS Lucene will support Dot product and vector apis of Java. What Lucene engine comes with is something called as Incremental graph builds. So, rather than creating graph index at refresh, Lucene engine will create graphs during indexing only. This will reduce the time taken during refresh.
Another thing can be we implement incremental graph creation with Faiss engine. There is big heavy lifting needs to be done here. Let me add a GH issue around this.

Dharin-shah commented 7 months ago

thanks @navneet1v

I would probably move towards a smaller refresh interval if you are using higher refresh interval with periodic force merge. This will ensure that a lot of documents containing the graphs are not accumulated.

The current problem is that we do quite frequent merges and segment creation, due to 30sec refresh interval. We have a write heavy workload.

here is Lucene segments are immutable so even if we delay the graph creation but if graphs are not created by the time segments are written on the disk this will lead to more catastrophic issues.

ofcourse yeah, since we use Faiss, is the hnsw graph not indexed separately? we would be fine if the graph is not reindex or the rebuilt for some time

Dharin-shah commented 7 months ago

I guess the fundamental problem is that graph reindexing takes up quite a lot of CPU resources, so perhaps addressing that might be better than complex processes

navneet1v commented 7 months ago

ofcourse yeah, since we use Faiss, is the hnsw graph not indexed separately?

no, its not.

we would be fine if the graph is not reindex or the rebuilt for some time

Yeah this separation cannot be done because the graph gets stored as a segment file. If want to separate the graphs from segments that will change the core fundamentals.

navneet1v commented 5 months ago

@Dharin-shah I was thinking about the feature you have been asking, one way I can see we might achieve something similar is by adding a setting which can say if the number of vectors is greater than a certain value in the segment then we will create the graphs otherwise we will not create the graph.

Lets say the limit is 1000 vectors. If a segment has more than 1K vectors we will create HNSW graph otherwise we will not create HNSW graph. During search if HNSW graph is present we will do the graph based search otherwise we can do exact search on the vectors which are stored. As segments will start to merge during indexing graphs HNSW graphs will be created when number of vectors threshold goes above 1K.

The only problem I can see with this, during background merges there will be spikes in CPU utilization, but it will same without this feature too. Hence I see for write heavy this can help.

Let me know your thoughts.

cc: @jmazanec15

I have hinted this idea in this GH issue: https://github.com/opensearch-project/k-NN/issues/1599 in section Create vector search data structures creation greedily. Although the idea in the issue is very generic but I do believe we can build that feature like I suggested in this comment.

vamshin commented 5 months ago

Like the idea @navneet1v. I have a question, if we are avoiding graph creation for smaller segments do you think this will be effective as I feel creating graphs for smaller segments may not be computationally expensive and overall impact might be low?

navneet1v commented 5 months ago

if we are avoiding graph creation for smaller segments do you think this will be effective as I feel creating graphs for smaller segments may not be computationally expensive and overall impact might be low?

@vamshin

Yes creating small graphs is computationally inexpensive, this is a case of heavy write traffic. The graphs will be created and thrown away as segments gets merged. In this case as the traffic was mainly of write I can put a higher value min number of vectors required to create graph and achieve a good write throughput. We will not be avoiding all the spikes in CPU util but atleast with this, we can increase the write throughput with occasional dip throughput when condition to create graphs is met.

Another extreme of this solution is, you completely stop the graph creation till you are indexing and then enable the graph creation during after wards and hit force merge api to recreate the segments and graphs. But that extreme is not going to work in the use case provided by @Dharin-shah. Hence I proposed some intermediate solution that still aligns with the creating graphs during force merges.

heemin32 commented 5 months ago

@Dharin-shah Are you trying to avoid 100% CPU usage or reduce how often the HNSW graph is constructed? Increasing the refresh time will only help with the second goal. If your goal is to avoid high CPU usage, we could focus on 1. incremental graph construction during ingestion(lucene support this) and 2. slowing down the graph construction process to prevent CPU exhaustion. For the second goal, we could reuse existing graph during merge(lucene support this)

Could you switch to lucene engine and see how it works for your case?

Dharin-shah commented 5 months ago

Gotchya, yep that makes sense, will try this out and report. Thanks @heemin32

navneet1v commented 2 weeks ago

@Dharin-shah the feature for delaying the graph creating is added in 2.18 and will be released with 2.18 . Would you be interested in testing the feature and provide your feedback.

ref: https://github.com/opensearch-project/k-NN/issues/1942

opensearch-project / k-NN

[FEATURE] Separate refresh interval for KNN segment merge #1577