AssertionError: ('Pointer shape torch.Size([256]) and array shape (64,) mismatched', torch.Size([256]), (64,))

etetteh commented 3 years ago

Getting the following error when converting my ckpt to huggingface's pytorch. I am using the same config file I used for the pretraining.

Traceback (most recent call last):
  File "/home/enoch/dl_repos/transformers/src/transformers/models/electra/convert_electra_original_tf_checkpoint_to_pytorch.py", line 78, in <module>
    args.tf_checkpoint_path, args.config_file, args.pytorch_dump_path, args.discriminator_or_generator
  File "/home/enoch/dl_repos/transformers/src/transformers/models/electra/convert_electra_original_tf_checkpoint_to_pytorch.py", line 43, in convert_tf_checkpoint_to_pytorch
    model, config, tf_checkpoint_path, discriminator_or_generator=discriminator_or_generator
  File "/home/enoch/dl_repos/transformers/src/transformers/models/electra/modeling_electra.py", line 140, in load_tf_weights_in_electra
    ), f"Pointer shape {pointer.shape} and array shape {array.shape} mismatched"
AssertionError: ('Pointer shape torch.Size([256]) and array shape (64,) mismatched', torch.Size([256]), (64,))

Also, coverting other ckpt does start at all except the 1M training step, which is also failing here

hyunssong commented 2 years ago

were you able to solve this issue?

etetteh commented 2 years ago

No, I wasn't able to resolve it

stefan-it commented 2 years ago

Could you please give more details e.g. config.json and the exact command for converting (discriminator or generator).

There's one known problem with small generator models (needs a config change).

hyunssong commented 2 years ago

I think it could be a problem of the config. I was experiencing the same problem, but it was because I was using the small model's config when I was converting the base model. Changed the config and works now.

zeno17 commented 2 years ago

@stefan-it could you tell me what the known problem with the small generator entails? Does it have to do with the setting of: self.generator_hidden_size = 0.25 # frac of discrim hidden size for gen

stefan-it commented 2 years ago

It was related to this configuration change (that is needed to convert the model correctly)

https://github.com/google-research/electra/issues/94#issuecomment-689633064

stefan-it / turkish-bert

AssertionError: ('Pointer shape torch.Size([256]) and array shape (64,) mismatched', torch.Size([256]), (64,)) #23