Explore the efficacy of ColBERTv2.0 from Hugging Face against the current embedding methods used in our project. This is an initial experiment to understand how ColBERTv2.0 compares in terms of search accuracy, speed, and storage requirements.
Background
Currently, our project utilizes standard embeddings, which may not fully leverage the token-level representations offered by models like ColBERTv2.0. ColBERT, or Contextualized Late Interaction over BERT, promises enhanced representation by generating multiple vectors per document, representing token-level or segment-level semantic information.
Plan
Setup: Install and configure ColBERTv2.0 from Hugging Face.
Objective
Explore the efficacy of ColBERTv2.0 from Hugging Face against the current embedding methods used in our project. This is an initial experiment to understand how ColBERTv2.0 compares in terms of search accuracy, speed, and storage requirements.
Background
Currently, our project utilizes standard embeddings, which may not fully leverage the token-level representations offered by models like ColBERTv2.0. ColBERT, or Contextualized Late Interaction over BERT, promises enhanced representation by generating multiple vectors per document, representing token-level or segment-level semantic information.
Plan
Expected Outcome
Additional Notes