Automate the 'precheck' validation step using semantic similarity scores

instructlab / instructlab-bot

GitHub bot to assist with the taxonomy contribution workflow

Apache License 2.0

14 stars 16 forks source link

This task involves automating the current 'precheck' stage which currently involves a human 'triage-er' to validate whether the student model already knows the information which a user is trying to teach the model.

Similar to the steps used in a standard RAG workflow, the sentences could be converted to vectors using embeddings and then could be compared using metrics like cosine similarity scores. Based on these scores, the 'precheck' stage can either be marked as a 'success' (✅ ) or a 'failure' (❎ ).

This can be included with the precheck call to the @instructlab-bot GH bot.

References:

https://huggingface.co/tasks/sentence-similarity
https://www.sbert.net/docs/usage/semantic_textual_similarity.html

instructlab / instructlab-bot

Automate the 'precheck' validation step using semantic similarity scores #354