xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.79k stars 132 forks source link

Why use DataCollatorForSeq2Seq to collect data? #47

Closed tsotfsk closed 11 months ago

tsotfsk commented 1 year ago

I think this is a text retrieval task. Why use the seq2seq approach to construct the data?

hongjin-su commented 1 year ago

Hi, Thanks for your interest in the INSTRUCTOR!

The INSTRUCTOR is an embedding model for general purpose, and it performs quite well in the retrieval tasks. Could you help to provide more contexts about which part of data collection you are referring to?

hongjin-su commented 11 months ago

Please re-open the issue if your have any questions or comments!