terrierteam / pyterrier_colbert

82 stars 35 forks source link

Retrieval Issue #42

Open wpertsch opened 2 years ago

wpertsch commented 2 years ago

Hello,

I already got a pretty big Index and now I try to do some test-retrieval. This is the code I tried first:

import faiss
assert faiss.get_num_gpus() > 0

import pyterrier as pt
pt.init()

import torch
print('torch version' , torch.__version__)
x = torch.rand(5, 3)
print(x)

checkpoint="/home/s2003857/javaIndex/checkpoint/colbert-10000.dnn"

from pyterrier_colbert.indexing import ColBERTIndexer
from pyterrier_colbert.ranking import ColBERTFactory

pyterrier_colbert_factory = pyterrier_colbert.ranking.ColBERTFactory(checkpoint, "/home/s2003857/javaIndex/indextest", "colbert_java_index")
#pyterrier_colbert_factory = indexer.ranking_factory()

colbert_e2e = pyterrier_colbert_factory.end_to_end()
(colbert_e2e % 10).search("chemical reactions")

print("retrival is da")

After getting the error

Traceback (most recent call last):
  File "smallretrival.py", line 17, in <module>
    pyterrier_colbert_factory = pyterrier_colbert.ranking.ColBERTFactory(checkpoint, "/home/s2003857/javaIndex/indextest", "colbert_java_index")
NameError: name 'pyterrier_colbert' is not defined

I changed the code to:

import faiss
assert faiss.get_num_gpus() > 0

import pyterrier as pt
pt.init()

import torch
print('torch version' , torch.__version__)
x = torch.rand(5, 3)
print(x)

checkpoint="/home/s2003857/javaIndex/checkpoint/colbert-10000.dnn"

from pyterrier_colbert.indexing import ColBERTIndexer
from pyterrier_colbert.ranking import ColBERTFactory

pyterrier_colbert_factory = ColBERTFactory(checkpoint, "/home/s2003857/javaIndex/indextest", "colbert_java_index")
#pyterrier_colbert_factory = indexer.ranking_factory()

colbert_e2e = pyterrier_colbert_factory.end_to_end()
(colbert_e2e % 10).search("chemical reactions")

print("retrival is da")

The full Error I am getting is this:

PyTerrier 0.8.0 has loaded Terrier 5.6 (built by craigmacdonald on 2021-09-17 13:27)

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.
torch version 1.10.1+cu113
tensor([[0.3226, 0.8167, 0.1429],
        [0.7141, 0.0719, 0.4174],
        [0.6066, 0.5820, 0.4509],
        [0.7547, 0.5944, 0.4332],
        [0.8414, 0.6289, 0.9862]])
1.10.1+cu113
Some weights of ColBERT were not initialized from the model checkpoint at microsoft/codebert-base and are newly initialized: ['linear.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[May 23, 21:38:52] #> Loading model checkpoint.
[May 23, 21:38:52] #> Loading checkpoint /home/s2003857/javaIndex/checkpoint/colbert-10000.dnn
[May 23, 21:38:53] #> checkpoint['epoch'] = 0
[May 23, 21:38:53] #> checkpoint['batch'] = 10000
Traceback (most recent call last):
  File "smallretrival.py", line 20, in <module>
    colbert_e2e = pyterrier_colbert_factory.end_to_end()
  File "/home/s2003857/javaIndex/venv/lib/python3.8/site-packages/pyterrier_colbert/ranking.py", line 773, in end_to_end
    return self.set_retrieve() >> self.index_scorer(query_encoded=True)
  File "/home/s2003857/javaIndex/venv/lib/python3.8/site-packages/pyterrier_colbert/ranking.py", line 607, in set_retrieve
    faiss_index = self._faiss_index()
  File "/home/s2003857/javaIndex/venv/lib/python3.8/site-packages/pyterrier_colbert/ranking.py", line 586, in _faiss_index
    self.faiss_index = FaissIndex(self.index_path, faiss_index_path, self.args.nprobe, self.args.part_range, mmap=self.faisstype == 'mmap')
TypeError: __init__() got an unexpected keyword argument 'mmap'

Is this some kind of installation issue? Thanks in advance and kind regards Wilhelm.

cmacdonald commented 2 years ago

I dont think you have installed our fork of ColBERT - i.e. https://github.com/cmacdonald/ColBERT/tree/v0.2/

If you pip install pyterrier_colbert, this should have been installed correctly.

seanmacavaney commented 2 years ago

Maybe we should have a way to detect that it's using the correct fork and & throw an error if it's not?

wpertsch commented 2 years ago

Ok, thank you!

We accidentally forked our Colbert-version from stanford-futuredata directly. We fixed it now!

@cmacdonald what do you mean with pip install pyterrier_colbert? I could not find a pip version.

best regards Wilhelm

cmacdonald commented 2 years ago

Sorry, I meant pip install -q git+https://github.com/terrierteam/pyterrier_colbert.git

as per https://github.com/terrierteam/pyterrier_colbert#installation