Query preprocessing - Githubissues

illuin-tech / colpali

The code used to train and run inference with the ColPali architecture.

https://huggingface.co/vidore

MIT License

1.18k stars 104 forks source link

Query preprocessing #12

Closed hannah348 closed 4 months ago

hannah348 commented 4 months ago

I was using the process_queries method and I realized that the processor uses left padding. As a result the batch_query["input_ids"] = batch_query["input_ids"][..., processor.image_seq_length :] does cut of the padding tokens and if there were padding tokens in the sequence the end of the image sequence is the beginning of the query. Is that intentional? If so what is the reasoning behind that?

ManuelFay commented 4 months ago

Hello ! Thanks for the catch ! For PaliGemma, the tokenizer should be set with padding side = "right" ! I am pushing an update to force that behaviour and will push updated checkpoints, thanks for the catch ! In practice, as is with a mock image, it just introduces extra noise but should still work !

hannah348 commented 4 months ago

Does the model require that extra noise? If this was done during training as well the padding tokens might lead to degradation to performance as the model has less tokens available to represent the text, since the attention mask does not exclude the image tokens in colpali_engine/models/paligemma_colbert_architecture.py line 83

ManuelFay commented 4 months ago

In practice, the tokens corresponding to the input_ids that must be replaced by the image soft tokens, currently gets included in the query - since it is associated with nothing (and never trained cause replaced otherwise), this acts as a learned padding token (but with attention). It should not hurt performance particularly and might even act as extra "buffer tokens".

I am retraining checkpoints with the fix (since it happens during training as well) which I will release once I push the update, along with the benchmark results !

ManuelFay commented 3 months ago

So everythinh shoild be fixed !

Would be awesome if you can confirm using this new model: https://huggingface.co/vidore/colpali-v1.1

and the code in branch: https://github.com/illuin-tech/colpali/tree/hard-negs

The base model version is fixed, and padding side is set to right, so issue should be fine @hannah348