Closed wrkaiser closed 3 years ago
Hi @wrkaiser,
Merge is an operation that we are looking to improve in the future. Currently, how it works is that we take the raw vectors from 2 segments and then build a new graph that contains all of them. This process is expensive and can take a long time.
At the moment, there are a few strategies you can follow to reduce the time:
index.refresh_interval = -1
and then renable after indexing finishes. I see that you increased refresh_interval to 5 minutes already, but going further and disabling refresh may help.ef_construction
parameter. This will impact recall, but if you are able to lower while still meeting your requirements, this will improve graph building speed.i get~ thank you very much.
vector_dimension=256 docs_count=1M index.refresh_interval= "5m"//five minute
when i request the "xxx/_forcemerge?max_num_segments=1&flush=true" http interface, segment merge successed cost 1.5h.