Closed michaelwechner closed 10 months ago
In order to fine-tune a LLM with DPO, we need a preference dataset, whereas see for example
https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac https://huggingface.co/datasets/Anthropic/hh-rlhf?row=0
Branch GH-8_preference_dataset created
Implemented in branch GH-8_preference_dataset and merged with main
In order to fine-tune a LLM with DPO, we need a preference dataset, whereas see for example
https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac https://huggingface.co/datasets/Anthropic/hh-rlhf?row=0