issues
search
h2oai
/
sql-sidekick
Experiment on QnA tabular data using LLMs and SQL
Apache License 2.0
23
stars
2
forks
source link
Explore ability to improve generation using Self-Rewarding
#76
Open
pramitchoudhary
opened
7 months ago
pramitchoudhary
commented
7 months ago
From understanding there r 2 parts of it,
LLM-as-a-judge, to generate reward score. As first step evaluate this strategy can help during re-generation request with exhaustive beam search.
Self-training/modification on preference pairs
Reference:
https://arxiv.org/abs/2401.10020
From understanding there r 2 parts of it,
Reference: https://arxiv.org/abs/2401.10020