refactor - Githubissues

vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

MIT License

152 stars 7 forks source link

Closed vwxyzjn closed 1 year ago

vwxyzjn commented 1 year ago

This PR uses native dataset instead of OAI's generator.

vwxyzjn commented 1 year ago

Both GPU memory and utilization go down, but speed actually goes up (the global step rise faster)

vwxyzjn commented 1 year ago