Synthetic Data - Githubissues

manisnesan / fastchai

Repository capturing deep learning & nlp experiments using fastai & pytorch

Apache License 2.0

2 stars 0 forks source link

Open manisnesan opened 3 months ago

manisnesan commented 3 months ago

manisnesan commented 3 months ago

The problem: There is no data for your use-case
The solution: Synthetic data to teach efficient students
- LLMs started reaching parity with human data annotators.
- Implications: high-quality annotation labor is now available through APIs. | reproducible annotation instructions can be sent as prompts | synthetic data is returned almost instantaneously with compute as the only bottleneck.
- How do you validate synthetic data is correct

manisnesan commented 3 months ago

How to Generate and Use Synthetic Data for Finetuning https://eugeneyan.com/writing/synthetic/

synthetic_eugeneyan