Open ZhiyuanChen opened 2 weeks ago
cc @ArthurZucker on MoEs
Hey! Sorry but without the config class I can't help 😓 would you mind providing the full repro?
Hey! Sorry but without the config class I can't help 😓 would you mind providing the full repro?
It should be reproducible for any config class, like ESM Im not allowed to share any thing at this stage as it's under review.
Ah sorry about that 😢
In general maybe removing safe_serialization
will prevent weight removal. You should also fill the tie_weight_keys
as basically these tensors share the same memory!
System Info
transformers
version: 4.44.2Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
I have the above layer definition.
Since it's a MoE module all experts shares one
layer_norm
, the layer norm of FFN is in Layer, not FFN.But when using the
save_pretrained
, transformers will move the weights to ffn automatically, causing the load to fail.