The solution: Synthetic data to teach efficient students
LLMs started reaching parity with human data annotators.
Implications: high-quality annotation labor is now available through APIs. | reproducible annotation instructions can be sent as prompts | synthetic data is returned almost instantaneously with compute as the only bottleneck.
https://github.com/MoritzLaurer/synthetic-data-blog/tree/main
https://huggingface.co/blog/synthetic-data-save-costs#32-compare-the-open-source-model-to-proprietary-models
Related: https://github.com/manisnesan/fastchai/issues/47#issuecomment-1968021707