xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.78k stars 131 forks source link

Instruction for keywords retrievial #90

Closed Bobolx00 closed 6 months ago

Bobolx00 commented 9 months ago

hi! I'm currently using the instructor-xl model, using as query a phrases to retrieve relevant keyword. I noticed that if I apply an instruction, the performance degradate (may be because I'm the training set, the query instruction usually introduce a string that is shorter than the string introduced with the document - passages instruction)

Any suggestions for instructions pair?

Thanks in advance!

hongjin-su commented 9 months ago

Hi, Thanks a lot for your interest in the INSTRUCTOR!

You may try to follow the instruction template: Represent {domain} {text_type} for {task_objective}:. For example, if the query is a news article, and we use it to retrieve relevant keywords, we may write instructions like {'query': 'Represent a news article for retrieving relevant keywords:', 'document': 'Represent a keyword for retrieval:'}

Hope this helps!

hongjin-su commented 6 months ago

Feel free to re-open the issue if you have any questions or comments!