xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.85k stars 134 forks source link

Can SetFit be incorporated InstructorEmbedding? #30

Open jackiexue1993 opened 1 year ago

jackiexue1993 commented 1 year ago

@tomaarsen Hi Tom. I am wondering if setfit can be incorporated with InstructorEmbedding. Would like to see a comparison for topic classification between setfit and InstructorEmbedding.

tomaarsen commented 1 year ago

(For those unaware, SetFit is an easy-to-use library for few-shot text classification, relying on sentence-transformers)

In theory, I think this is very feasible. My understanding is that InstructorEmbedding subclasses Sentence Transformers, meaning that adding support should be relatively straightforward. The question that remains is what the prompt(s) should be, and whether users should have a say in determining them. I'll need to figure out how exactly this would work best. That said, the Instructor embeddings seem quite strong compared to "default" sentence transformers, so I'm hopeful that "InstructSetFit" would be very competitive for few-shot text classification.

I'm rather busy at the moment, but I'll try to find some time for this.

Harry-hash commented 1 year ago

Thanks a lot for your comments @tomaarsen @jackiexue1993

The instruction may be important for specific use cases. The diverse users will have better experience if the text embeddings can be customized to their scenarios.