h2oai / sql-sidekick

Experiment on QnA tabular data using LLMs and SQL
Apache License 2.0
23 stars 2 forks source link

Explore ability to improve generation using Self-Rewarding #76

Open pramitchoudhary opened 7 months ago

pramitchoudhary commented 7 months ago

From understanding there r 2 parts of it,

  1. LLM-as-a-judge, to generate reward score. As first step evaluate this strategy can help during re-generation request with exhaustive beam search.
Screen Shot 2024-01-22 at 1 27 14 PM
  1. Self-training/modification on preference pairs

Reference: https://arxiv.org/abs/2401.10020