Open ControllableGeneration opened 7 months ago
I want to convert a pretrained LLM into a Megads version of an MoE of it and save it.
The weights of the experts created should be identical to the weight of the Linear layer where the experts are created.
My guess would be to use hf2megads_weight_converter.py. However, it raises an error as
ValueError: Unrecognized weight type, with subname=mlp.deepspeed_moe.gate.wg.weight
And the code shows that it does not support moe weights.
244 def refactor(self): 245 assert self.is_refactored == False 246 new_w = None 247 for pname, p in self.model.named_parameters(): 248 if pname in [ 249 f"{self.mega_emb_wnum}.word_embeddings.weight", 250 f"{self.mega_lm_head_wnum}.lm_head.weight" 251 ]: 252 new_w = self._embedding_refactor(pname, p) 253 elif pname == f"{self.mega_norm_wnum}.weight": 254 new_w = self._direct_refactor(pname, p) 255 else: 256 mobj = self.decoder_pat.match(pname) 257 layer_num = int(mobj.group(1)) 258 subname = mobj.group(2) 259 hf_layer = layer_num - self.offset_num 260 if subname in ["self_attention.query_key_value.weight"]: 261 new_w = self._qkv_refactor(pname, p, hf_layer) 262 elif subname in ["mlp.dense_h_to_4h.weight"]: 263 new_w = self._mlphto4h_dense_refactor(pname, p, hf_layer) 264 elif subname in [ 265 "self_attention.dense.weight", 266 "mlp.dense_4h_to_h.weight" 267 ]: 268 new_w = self._attn_dense_refactor(pname, p, hf_layer, subname) 269 elif subname in [ 270 "mlp.dense_h_to_4h1.weight", 271 "mlp.dense_h_to_4h2.weight" 272 ]: 273 new_w = self._mlphto4h1_refactor() 274 elif subname in [ 275 "input_layernorm.weight", 276 "post_attention_layernorm.weight" 277 ]: 278 new_w = self._direct_refactor(pname, p, hf_layer, subname) 279 else: 280 raise ValueError(f"Unrecognized weight type, with subname={subname}")
Please consider adding MoE model conversion, I would also do it myself and let you know if I succeed
I want to convert a pretrained LLM into a Megads version of an MoE of it and save it.
The weights of the experts created should be identical to the weight of the Linear layer where the experts are created.
My guess would be to use hf2megads_weight_converter.py. However, it raises an error as
And the code shows that it does not support moe weights.
Please consider adding MoE model conversion, I would also do it myself and let you know if I succeed