hi I am currently working on a project which needs millions sometimes even billions of vectors to be inserted to build up a graph, and I follow the example.py in https://github.com/nmslib/hnswlib/tree/master with 4000K vectors like below code
p = hnswlib.Index('l2', dim)
print("before build ", datetime.datetime.now())
p.init_index(max_elements = num_elements, ef_construction = 128, M = 16)
p.add_items(vectorNP, ids)
p.save_index("/Users/XXX/Projects/builder/hnsw-embedding-test/python_test/combined.bin")
it took around 2 mins to finish,
but when use with libhnswlib-jna-x86-64 with 16 cores, by
val hnswIndex = new ConcurrentIndex(SpaceName.L2, dimension)
hnswIndex.initialize(3890521, 16, 128, 42)
val embeddingRecordsPar = parquet4sReader.toList.par
embeddingRecordsPar.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool(16))
embeddingRecordsPar.foreach{ eb =>
val ba = eb.vectors.head
if (ba.length > 0) {
val vector = RawEmbedding.toVector(RichByteArray(ba).asByteBuffer, dimension, "float16")
hnswIndex.addNormalizedItem(vector, i)
i = i + 1
}
}
it is around 15-16mins (same time cost if I change ConcurrentIndex into Index or use Index.synchronizedIndex), all above two part of codes runnning in my local machine, I'm wondering if there is same function like add_items in this hnswlib-jna or any other ways that can faster the speed of building up graph?
hi I am currently working on a project which needs millions sometimes even billions of vectors to be inserted to build up a graph, and I follow the example.py in https://github.com/nmslib/hnswlib/tree/master with 4000K vectors like below code
it took around 2 mins to finish,
but when use with libhnswlib-jna-x86-64 with 16 cores, by
it is around 15-16mins (same time cost if I change ConcurrentIndex into Index or use Index.synchronizedIndex), all above two part of codes runnning in my local machine, I'm wondering if there is same function like add_items in this hnswlib-jna or any other ways that can faster the speed of building up graph?