illuin-tech / colpali

The code used to train and run inference with the ColPali architecture.
https://huggingface.co/vidore
MIT License
1.03k stars 93 forks source link

Different embedding for same image when model reloads #30

Closed paniabhisek closed 2 months ago

paniabhisek commented 2 months ago

Everytime I load the model, the embedding of an image changes. Can you explain why?

Code used to get the embedding

import torch
from torch.utils.data import DataLoader
from tqdm import tqdm
from transformers import AutoProcessor
from PIL import Image

from colpali_engine.models.paligemma_colbert_architecture import ColPali
from colpali_engine.trainer.retrieval_evaluator import CustomEvaluator
from colpali_engine.utils.colpali_processing_utils import process_images, process_queries
from colpali_engine.utils.image_from_page_utils import load_from_dataset
from pdf2image import convert_from_path

# Load model
model_name = "vidore/colpali"
model = ColPali.from_pretrained("google/paligemma-3b-mix-448", torch_dtype=torch.bfloat16, device_map="cuda:2", temperature=0).eval()
#model.load_adapter(model_name)
processor = AutoProcessor.from_pretrained(model_name, device_map="cuda:2", temperature=0)

# select images -> load_from_pdf(<pdf_path>),  load_from_image_urls(["<url_1>"]), load_from_dataset(<path>)
images = convert_from_path('/path/to/filename')
queries = ["My question here"]

# run inference - docs
dataloader = DataLoader(
    images,
    batch_size=4,
    shuffle=False,
    collate_fn=lambda x: process_images(processor, x),
)
ds = []
for batch_doc in tqdm(dataloader):
    with torch.no_grad():
        batch_doc = {k: v.to(model.device) for k, v in batch_doc.items()}
        embeddings_doc = model(**batch_doc)
    ds.extend(list(torch.unbind(embeddings_doc.to("cpu"))))

ds1 = []
for batch_doc in tqdm(dataloader):
    with torch.no_grad():
        batch_doc = {k: v.to(model.device) for k, v in batch_doc.items()}
        embeddings_doc = model(**batch_doc)
    ds1.extend(list(torch.unbind(embeddings_doc.to("cpu"))))

If I use the same loaded model, the embedding is same. But if I unload and load again, it changes.

How can I make it deterministic so I can get the embedding same everytime?

ManuelFay commented 2 months ago

hey, so this issue was fixed between v0.1.1 and v0.2.0 (also works on previous version but models are not trained optimally for it).

Essentially, you want to use our fixed base model vidore/colpaligemma-3b-pt-448-base and make sure to also add .eval() after loading the adapter. Your best bet is to use the v1.2 model (but this will also be deterministic with the original colpali).

model_name = "vidore/colpali-v1.2"
model = ColPali.from_pretrained("vidore/colpaligemma-3b-pt-448-base", torch_dtype=torch.bfloat16, device_map="cuda").eval()
model.load_adapter(model_name)
model = model.eval()
processor = AutoProcessor.from_pretrained(model_name)