Open hxdtest opened 6 days ago
Hi @hxdtest , the megatron_v4.patch is necessary for veRL for two main reasons:
initialize_megatron
, which initializes the global args. We only build the necessary process group by using mpu.initialize_model_parallel
. Therefore, we have to delete the usage of get_args(). Case 4 is where we delete the get_args()
and overlap_param_gather
is set to False by default.For case 1, config.hidden_size should be equal to hidden_size.
False
in case 3 could be removed as the default value is False and there seems to be no way to change its value in v0.4
Many thanks for your reply.
@PeterSH6
Have you tested verl
with model size that's larger than 300B ? For example, have you tested llama3 405B ppo training on verl
?
https://github.com/volcengine/verl/blob/main/patches/megatron_v4.patch
For example:
what is the difference between hidden_size and config.hidden_size?
return FusedLayerNormAffineFunction.apply(input, weight, self.bias, self.normalized_shape, self.eps)
return FusedLayerNormAffineFunction.apply(input, weight, self.bias, self.normalized_shape, self.eps, False)
case 4
self.overlap_param_gather = overlap_param_gather if self.overlap_param_gather: self.remove_pre_hook_handle = torch.nn.modules.module.register_module_forward_pre_hook( self._make_forward_pre_hook())
Many thanks !