Open debraj135 opened 1 year ago
Wondering if I'm missing a detail. Did anyone else also come across this?
Thanks a lot for your interest in the INSTRUCTOR!
Like other LLMs, the INSTRUCTOR is sensitive to the instructions, which may be worsened by its small size. I would say all of your proposed instructions follow the basic templates, while we may need more trials or heuristics to figure out the best instruction.
Thank you. I had a few follow up questions
;
or a colon :
at the end of the instruction?Following back on this.
Sorry for the late reply!
In our training and evaluation, we may not be very strict on punctuation. We are glad to make it more consistent in our future versions!
I noticed that the instructions in the training data end with
;
and no whitespace after that.For example
'Represent the Science sentence;'
instead of'Represent the Science sentence: '
Whereas in the readme, the proposed format seems to be
'Represent the Science sentence: '
sometimes and'Represent the Science sentence:'
in other places.All of these three seem to be resulting in different embeddings and hence different similarity numbers. Can you please let us know what is the right instruction template?