microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.9k stars 346 forks source link

Call for Conversion from Huggingface to Megads with MoE #381

Open ControllableGeneration opened 7 months ago

ControllableGeneration commented 7 months ago

I want to convert a pretrained LLM into a Megads version of an MoE of it and save it.

The weights of the experts created should be identical to the weight of the Linear layer where the experts are created.

My guess would be to use hf2megads_weight_converter.py. However, it raises an error as

ValueError: Unrecognized weight type, with subname=mlp.deepspeed_moe.gate.wg.weight

And the code shows that it does not support moe weights.

244     def refactor(self):
245         assert self.is_refactored == False
246         new_w = None
247         for pname, p in self.model.named_parameters():
248             if pname in [
249                     f"{self.mega_emb_wnum}.word_embeddings.weight",
250                     f"{self.mega_lm_head_wnum}.lm_head.weight"
251             ]:
252                 new_w = self._embedding_refactor(pname, p)
253             elif pname == f"{self.mega_norm_wnum}.weight":
254                 new_w = self._direct_refactor(pname, p)
255             else:
256                 mobj = self.decoder_pat.match(pname)
257                 layer_num = int(mobj.group(1))
258                 subname = mobj.group(2)
259                 hf_layer = layer_num - self.offset_num
260                 if subname in ["self_attention.query_key_value.weight"]:
261                     new_w = self._qkv_refactor(pname, p, hf_layer)
262                 elif subname in ["mlp.dense_h_to_4h.weight"]:
263                     new_w = self._mlphto4h_dense_refactor(pname, p, hf_layer)
264                 elif subname in [
265                         "self_attention.dense.weight",
266                         "mlp.dense_4h_to_h.weight"
267                 ]:
268                     new_w = self._attn_dense_refactor(pname, p, hf_layer, subname)
269                 elif subname in [
270                         "mlp.dense_h_to_4h1.weight",
271                         "mlp.dense_h_to_4h2.weight"
272                 ]:
273                     new_w = self._mlphto4h1_refactor()
274                 elif subname in [
275                         "input_layernorm.weight",
276                         "post_attention_layernorm.weight"
277                 ]:
278                     new_w = self._direct_refactor(pname, p, hf_layer, subname)
279                 else:
280                     raise ValueError(f"Unrecognized weight type, with subname={subname}")

Please consider adding MoE model conversion, I would also do it myself and let you know if I succeed