vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase
MIT License
152 stars 7 forks source link

refactor #22

Closed vwxyzjn closed 1 year ago

vwxyzjn commented 1 year ago

This PR uses native dataset instead of OAI's generator.

vwxyzjn commented 1 year ago

Both GPU memory and utilization go down, but speed actually goes up (the global step rise faster)

image
vwxyzjn commented 1 year ago
image

https://wandb.ai/openrlbenchmark/lm_human_preference_details/reports/refactor--Vmlldzo1NDcyMDUx