hwchase17 / langchain-hub

3.24k stars 264 forks source link

Task Fine-Tuning - Datasets, Examples, etc #19

Open Glavin001 opened 1 year ago

Glavin001 commented 1 year ago

🎯 Goal: Help a developer go from idea to production-ready custom large-language model in record time!

Problem

In the LLM landscape, LangChain has support for:

There remains a gap for Fine-Tuning support, both education, tooling, and usable examples (like the Prompts in Hub).

When to use fine-tuning?

I found @daveshap 's YouTube video OpenAI Q&A: Finetuning GPT-3 vs Semantic Search - which to use, when, and why? incredibly informative, especially this comparison:

Fine-tuning Semantic Search/Embeddings
Slow, difficult, expensive Fast, easy, cheap
Prone to confabulation Recalls exact information
Teaches new task, not new information Adding new information in a cinch
Requires constant retraining Adding new vectors is easy
Not scalable Infinitely scalable
Does not work for Question-Answering Solve half of Question-Answering

🎯 There is still a purpose to fine-tuning: when you want to teach a new task/pattern.

For example, patterns which fine-tuning helps with:

I think Langchain and the community has an opportunity to build tools to make dataset generation easier for fine-tuning, provide educational examples, and also provide ready-made datasets for bootstrapping production-ready applications.

Proposal

Glavin001 commented 1 year ago

@daveshap : What do you think about this idea? I've been inspired by learning from your YouTube videos recently while using Langchain. I think it would be an incredible win for the community to combine our efforts to building incredible products with LLMs!

dhruv-anand-aintech commented 1 year ago

Isn't this a matter of integrating SetFit (https://huggingface.co/blog/setfit) into LangChain?

rreddy-flowinc commented 1 year ago

would love this incorporation^