stanford-futuredata / ColBERT

ColBERT: state-of-the-art neural search (SIGIR'20, TACL'21, NeurIPS'21, NAACL'22, CIKM'22, ACL'23, EMNLP'23)
MIT License
3.1k stars 391 forks source link

Basic Training (ColBERTv1-style) -> ujson.JSONDecodeError: Expected object or value #311

Open kevinningthu opened 9 months ago

kevinningthu commented 9 months ago

seems like triples="/path/to/MSMARCO/triples.train.small.tsv" (qid, pid+, pid-) are not supported for training anymore,

The triples = '/path/to/examples.64.json' should be like this. image

GhostCR323 commented 9 months ago

I do not understand what the figures stand for?

GhostCR323 commented 9 months ago

i happen to this error too

matheusft commented 1 month ago

I'm also having this error when trying to add documents to an index created with RAGatouille (colbert-ir/colbertv2.0)

[Oct 11, 14:25:17] #> Optimizing IVF to store map from centroids to list of pids..
[Oct 11, 14:25:17] #> Building the emb2pid mapping..
Traceback (most recent call last):
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/RAGPretrainedModel.py", line 258, in add_to_index
    self.model.add_to_index(
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/models/colbert.py", line 179, in add_to_index
    self.model_index.add(
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/models/index.py", line 398, in add
    self.build(
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/ragatouille/models/index.py", line 243, in build
    indexer.index(name=index_name, collection=collection, overwrite=overwrite)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexer.py", line 80, in index
    self.__launch(collection)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexer.py", line 89, in __launch
    launcher.launch_without_fork(self.config, collection, shared_lists, shared_queues, self.verbose)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/infra/launcher.py", line 93, in launch_without_fork
    return_val = run_process_without_mp(self.callee, new_config, *args)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/infra/launcher.py", line 109, in run_process_without_mp
    return_val = callee(config, *args)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 33, in encode
    encoder.run(shared_lists)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 76, in run
    self.finalize() # Builds metadata and centroid to passage mapping
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 396, in finalize
    self._build_ivf()
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/collection_indexer.py", line 482, in _build_ivf
    _, _ = optimize_ivf(ivf, ivf_lengths, self.config.index_path_)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/utils.py", line 13, in optimize_ivf
    all_doclens = load_doclens(index_path, flatten=False)
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/loaders.py", line 32, in load_doclens
    all_doclens = [ujson.load(open(filename)) for filename in doclens_filenames]
  File "/home/username/repo_name/.venv/lib/python3.10/site-packages/colbert/indexing/loaders.py", line 32, in <listcomp>
    all_doclens = [ujson.load(open(filename)) for filename in doclens_filenames]
ujson.JSONDecodeError: Expected object or value