snap-stanford / stark

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (https://stark.stanford.edu/)
https://stark.stanford.edu/
MIT License
270 stars 33 forks source link

Embedding generation is slow due to inefficient search #6

Open Bhuvan-21 opened 3 weeks ago

Bhuvan-21 commented 3 weeks ago

In the emb_generate.py file, for every index, the program checks whether it is in the existing list of indices here

if idx in exisiting_indices:
    continue

This operation is O(n) where n is the length of existing indices, this can be easily avoided by converting this list into a set before starting the loop. This is a minor change, hopefully the authors can take care of this in the next commit.

exisiting_indices = set(exisiting_indices)