Closed vwxyzjn closed 1 year ago
This PR uses native dataset instead of OAI's generator.
Both GPU memory and utilization go down, but speed actually goes up (the global step rise faster)
https://wandb.ai/openrlbenchmark/lm_human_preference_details/reports/refactor--Vmlldzo1NDcyMDUx
This PR uses native dataset instead of OAI's generator.