Task Fine-Tuning - Datasets, Examples, etc

Glavin001 commented 1 year ago

🎯 Goal: Help a developer go from idea to production-ready custom large-language model in record time!

Problem

In the LLM landscape, LangChain has support for:

✅ Prompt Engineering => langchain-hub/prompts
- ✅ Chains / Agents => langchain-hub/chains
🚧 Semantic Search / Embeddings => https://github.com/hwchase17/langchain-hub/issues/18
:x: Fine-Tuning

There remains a gap for Fine-Tuning support, both education, tooling, and usable examples (like the Prompts in Hub).

When to use fine-tuning?

I found @daveshap 's YouTube video OpenAI Q&A: Finetuning GPT-3 vs Semantic Search - which to use, when, and why? incredibly informative, especially this comparison:

Fine-tuning	Semantic Search/Embeddings
Slow, difficult, expensive	Fast, easy, cheap
Prone to confabulation	Recalls exact information
Teaches new task, not new information	Adding new information in a cinch
Requires constant retraining	Adding new vectors is easy
Not scalable	Infinitely scalable
Does not work for Question-Answering	Solve half of Question-Answering

🎯 There is still a purpose to fine-tuning: when you want to teach a new task/pattern.

For example, patterns which fine-tuning helps with:

ChatGPT: short user query => long machine answer
Email
Novel / Fiction

I think Langchain and the community has an opportunity to build tools to make dataset generation easier for fine-tuning, provide educational examples, and also provide ready-made datasets for bootstrapping production-ready applications.

Proposal

[ ] Recreate examples @daveshap made using Langchain and add results to the Hub!
- [ ] Creative Writing Coach (Part 1, Part 2)
- [ ] Email Generator
- [ ] Write a coherent novel (Part 4)
[ ] Question-Answering with related docs (from semantic search) and a personality (multiple options?)
[ ] Debugging issues from code errors

Glavin001 commented 1 year ago

@daveshap : What do you think about this idea? I've been inspired by learning from your YouTube videos recently while using Langchain. I think it would be an incredible win for the community to combine our efforts to building incredible products with LLMs!

dhruv-anand-aintech commented 1 year ago

Isn't this a matter of integrating SetFit (https://huggingface.co/blog/setfit) into LangChain?

rreddy-flowinc commented 1 year ago

would love this incorporation^

hwchase17 / langchain-hub