manisnesan / fastchai

Repository capturing deep learning & nlp experiments using fastai & pytorch
Apache License 2.0
2 stars 0 forks source link

Synthetic Data #72

Open manisnesan opened 3 months ago

manisnesan commented 3 months ago

https://github.com/MoritzLaurer/synthetic-data-blog/tree/main

https://huggingface.co/blog/synthetic-data-save-costs#32-compare-the-open-source-model-to-proprietary-models

image

Related: https://github.com/manisnesan/fastchai/issues/47#issuecomment-1968021707

manisnesan commented 3 months ago
  1. The problem: There is no data for your use-case

  2. The solution: Synthetic data to teach efficient students

    • LLMs started reaching parity with human data annotators.
    • Implications: high-quality annotation labor is now available through APIs. | reproducible annotation instructions can be sent as prompts | synthetic data is returned almost instantaneously with compute as the only bottleneck.
    • How do you validate synthetic data is correct
manisnesan commented 3 months ago

Related

How to Generate and Use Synthetic Data for Finetuning https://eugeneyan.com/writing/synthetic/

synthetic_eugeneyan