Open ChaofanTao opened 8 months ago
the faiss OIVFBBS code is more general than what's required for in-context pretraining, see https://github.com/facebookresearch/faiss/tree/main/demos/offline_ivf , e.g. you can have a different dataset for the query vectors than for the database vectors. In this case, you are right, we only use it by searching the document embeddings into themselves so I have removed the remark in the README to avoid confusion.
Hi, Thanks for your time!
Based on the ReadMe that 'We provide an example corpus in data/b3g to demonstrate our pipeline.', I wonder where are the files in 'data/b3g'?
In addition, for the step 4
Run the search distributed job
, there are two commands. Command 1:python run.py --command search --config configs/config_test.yaml --xb ccnet_new --cluster_run --partition learnlab
Command 2:python run.py --command search --config configs/config_test.yaml --xb ccnet_new --xq edouard_val
I am confused about the remark. For just one database that has multiple documents, should I run these 2 commands step by step or just command 1 ?