raphaelsty / neural-cherche

Neural Search
https://raphaelsty.github.io/neural-cherche/
MIT License
347 stars 17 forks source link

[bug] no query_mode argument for sentence transformers encode #15

Closed snewcomer closed 9 months ago

snewcomer commented 9 months ago

I haven't been able to find any documentation on the query_mode argument to encode. I believe this is the right fix.

raphaelsty commented 9 months ago

Hi @snewcomer this parameter query_mode add a distinct prefix token such as '[Q]' for queries and '[D]' for documents. I think you are initializing the ColBERT model using a Sentence Transformer but as you can see in the doctoring, you need to pass the sentence transformer checkpoint name rather than the Sentence Transformer model itself

import torch

from neural_cherche import models, utils, train

model = models.ColBERT(
    model_name_or_path="sentence-transformers/all-mpnet-base-v2",
    device="cuda" if torch.cuda.is_available() else "cpu"
)

If you share some code where you encounter the error, it may be easier to spot the issue

snewcomer commented 9 months ago

Thanks Raphael! You are right. This is clearly the wrong fix. I thought I was hitting an error with query_mode passed to either the tokenizer to the sentence transformers encode via kwargs but I'm not able to reproduce anymore. Closing!