texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
435 stars 87 forks source link

Improve retriever encoding code #126

Closed ArvinZhuang closed 1 month ago

ArvinZhuang commented 1 month ago

This patch makes the following improvements:

  1. change the padding token from unk to eos, since some models like Llama3 do not have unk token.
  2. enabling auto device map for loading big models with multi gpus.
  3. allows bf16 encoding.