lucidrains / PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
MIT License
7.67k stars 668 forks source link

implement an argument to directly set ff_inner_dim #52

Open chris-ha458 opened 1 year ago

chris-ha458 commented 1 year ago

In NVIDIA nvidia/GPT-2B-001, a very PaLM like model is implemented.

However, instead of a ffn multiplier like ffn_mult the ffn_hidden_size (comparable to ffn_inner_dim of this codebase) is directly set as 5440.

This translates to a ffn_mult of 2.65625. However, trying this in this codebase does not work.

The error

TypeError: empty() received an invalid combination of arguments - got (tuple, dtype=NoneType, device=NoneType), but expected one of:
 * (tuple of ints size, *, tuple of names names, torch.memory_format memory_format, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)
 * (tuple of ints size, *, torch.memory_format memory_format, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool pin_memory, bool requires_grad)

So I implemented a way to directly set the ffn_inner_dim please take a look!

chris-ha458 commented 1 year ago

Also, the formatter I use, changes the layout a lot so I had to manually modify the code. What formatter does this repo use?

GadiZimerman commented 1 year ago

@CodiumAI-Agent /review

CodiumAI-Agent commented 1 year ago

PR Analysis

How to use

Tag me in a comment '@CodiumAI-Agent' and add one of the following commands: /review - Request a review of the latest update to the PR. /describe - Modify the PR title and description based on the contents of the PR. /improve - Suggest improvements to the code in the PR. These will be provided as pull request comments, ready to commit. /ask - Pose a question about the PR.