Open shm007g opened 1 year ago
https://openai.com/research/techniques-for-training-large-neural-networks https://openai.com/research/sparse-transformer https://openai.com/research/measuring-goodharts-law https://openai.com/research/webgpt https://openai.com/research
[LLM are zero-shot rankers for recommender system] [Amazon, textbooks are all you need: learning language representation for sequence recommandation] A new alternative to RLHF just dropped! https://twitter.com/rasbt/status/1663883300522295296 [Direct Preference Optimization: Your Language Model is Secretly a Reward Model https://arxiv.org/abs/2305.18290 ] https://github.com/eric-mitchell/direct-preference-optimization https://github.com/LAION-AI/Open-Assistant/discussions/3347 distilling step by step: outperforming llm with less training data and smaller model size
[Instruction tuning with GPT-4, Microsoft, 2023.04]