about the length of tokens

jingtaozhan / DRhard

SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.

BSD 3-Clause "New" or "Revised" License

125 stars 14 forks source link

about the length of tokens #6

Closed kaishxu closed 3 years ago

kaishxu commented 3 years ago

Hello,

I have read your paper and am quite interested in your work! There is a question about the tokens. I notice you truncat the passage tokens with 120 in MSMARCO Passage Retrieval, however, for ANCE, the original paper uses 512 tokens. So does the number of tokens have the impact on the accuracy?

jingtaozhan commented 3 years ago

No We just don't have that many advanced GPUs to afford very long input :( BTW, the average passage length is about 70 tokens and truncating to 120 shouldn't be a problem.

kaishxu commented 3 years ago

hhhhhhhhha!

Thanks a lot for your reply! I have no multiple GPUs either! Lol! I agree with your settings!