xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.79k stars 132 forks source link

docs: instructions syntax #48

Closed greenpau closed 1 year ago

greenpau commented 1 year ago

Here is what I have noted for myself.

The syntax to write instructions: "Represent the domain text_type for task_objective: ", where:

The questions:

  1. what is the exhaustive list of possible values for domain?
  2. what is the exhaustive list of possible values for text_type?
  3. what is the exhaustive list of possible values for task_objective?
greenpau commented 1 year ago

Awesome work! 👍 Thank you for the models!

hongjin-su commented 1 year ago

Hi, Thanks a lot for your interest in the INSTRUCTOR!

Although it may be hard to exhaustively list all possible values of domain, test type and task objective, we provide some examples in the table 4 of our paper.

Feel free to add more questions or comments!

greenpau commented 1 year ago

@hongjin-su , thank you!

The info I was looking for.

domain = [
  'wikipedia',
  'news',
  'medicine',
  'biology',
  'reddit',
  'stackoverflow',
  'science',
  'quora',
  'coronavirus',
  'math',
  'physics'
]

text_type = [
  'question',
  'query',
  'answer',
  'summary',
  'sentence',
  'review',
  'post',
  'comment',
  'statement',
  'paragraph',
  'passage',
  'document'
] 

task_objective = [
  'classify the sentence as positive or negative',
  'retrieve a duplicate sentence',
  'retrieve the supporting document'
]