Open kevinningthu opened 9 months ago
I do not understand what the figures stand for?
i happen to this error too
I'm also having this error when trying to add documents to an index created with RAGatouille (colbert-ir/colbertv2.0
)
[Oct 11, 14:25:17] #> Optimizing IVF to store map from centroids to list of pids..
[Oct 11, 14:25:17] #> Building the emb2pid mapping..
Traceback (most recent call last):
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/RAGPretrainedModel.py", line 258, in add_to_index
self.model.add_to_index(
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/models/colbert.py", line 179, in add_to_index
self.model_index.add(
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/models/index.py", line 398, in add
self.build(
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/models/index.py", line 243, in build
indexer.index(name=index_name, collection=collection, overwrite=overwrite)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexer.py", line 80, in index
self.__launch(collection)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexer.py", line 89, in __launch
launcher.launch_without_fork(self.config, collection, shared_lists, shared_queues, self.verbose)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/infra/launcher.py", line 93, in launch_without_fork
return_val = run_process_without_mp(self.callee, new_config, *args)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/infra/launcher.py", line 109, in run_process_without_mp
return_val = callee(config, *args)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 33, in encode
encoder.run(shared_lists)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 76, in run
self.finalize() # Builds metadata and centroid to passage mapping
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 396, in finalize
self._build_ivf()
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 482, in _build_ivf
_, _ = optimize_ivf(ivf, ivf_lengths, self.config.index_path_)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/utils.py", line 13, in optimize_ivf
all_doclens = load_doclens(index_path, flatten=False)
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/loaders.py", line 32, in load_doclens
all_doclens = [ujson.load(open(filename)) for filename in doclens_filenames]
File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/loaders.py", line 32, in <listcomp>
all_doclens = [ujson.load(open(filename)) for filename in doclens_filenames]
ujson.JSONDecodeError: Expected object or value
seems like triples="/path/to/MSMARCO/triples.train.small.tsv" (qid, pid+, pid-) are not supported for training anymore,
The triples = '/path/to/examples.64.json' should be like this.