Hi, I trained a model on my own task and subtask but when running the model I get the below error 1 that config.json is missing. When I look at the model directory I can confirm that it was never generated when training the model. Is there a way I can get the config.json that is required.
(codet5) D:\CodeT5\sh>python test-results.py
Traceback (most recent call last):
File "test-results.py", line 6, in <module>
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\modeling_utils.py", line 1067, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\configuration_utils.py", line 427, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\configuration_utils.py", line 484, in get_config_dict
resolved_config_file = cached_path(
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\file_utils.py", line 1289, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse D:\CodeT5\sh\saved_models\qcbot\codet5_small_all_lr5_bs32_src256_trg128_pat2_e1\checkpoint-best-ppl\config.json as a URL or as a local path
Errror 2
(codet5) D:\CodeT5\sh>python test-results.py
Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at D:/CodeT5/sh/saved_models/qcbot/codet5_small_all_lr5_bs32_src256_trg128_pat2_e1/checkpoint-best-ppl and are newly initialized: ['encoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'encoder.block.8.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.1.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'encoder.block.6.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'encoder.block.6.layer.1.DenseReluDense.wi.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'encoder.block.10.layer.1.layer_norm.weight', 'encoder.block.7.layer.1.DenseReluDense.wi.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'encoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.1.layer_norm.weight', 'encoder.block.6.layer.0.layer_norm.weight', 'encoder.block.8.layer.1.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'encoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight', 'encoder.block.11.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'encoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'encoder.block.10.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.0.SelfAttention.o.weight', 'encoder.block.9.layer.1.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'encoder.block.10.layer.0.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'encoder.block.8.layer.1.DenseReluDense.wi.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'encoder.block.11.layer.0.SelfAttention.o.weight', 'encoder.block.6.layer.1.DenseReluDense.wo.weight', 'encoder.block.9.layer.1.DenseReluDense.wo.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'encoder.block.11.layer.1.DenseReluDense.wo.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'encoder.block.10.layer.1.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'encoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'encoder.block.11.layer.0.SelfAttention.v.weight', 'encoder.block.11.layer.1.layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'encoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'encoder.block.8.layer.1.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'encoder.block.7.layer.0.SelfAttention.q.weight', 'encoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'encoder.block.10.layer.1.DenseReluDense.wi.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'encoder.block.10.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.0.layer_norm.weight', 'encoder.block.9.layer.1.DenseReluDense.wi.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'encoder.block.9.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.0.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.0.SelfAttention.k.weight', 'encoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'encoder.block.9.layer.0.SelfAttention.k.weight', 'encoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'encoder.block.11.layer.1.DenseReluDense.wi.weight', 'encoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "test-results.py", line 6, in <module>
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\modeling_utils.py", line 1213, in from_pretrained
model, missing_keys, unexpected_keys, error_msgs = cls._load_state_dict_into_model(
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\modeling_utils.py", line 1354, in _load_state_dict_into_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
size mismatch for shared.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).
size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).
size mismatch for encoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 8]) from checkpoint, the shape in current model is torch.Size([32, 12]).
size mismatch for encoder.block.0.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.0.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.block.0.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for encoder.block.0.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.1.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.1.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.block.1.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for encoder.block.1.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.2.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.2.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.block.2.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for encoder.block.2.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.3.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.3.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.block.3.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for encoder.block.3.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.4.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.4.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.block.4.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for encoder.block.4.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for encoder.block.5.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.block.5.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for encoder.block.5.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for encoder.block.5.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for encoder.final_layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).
size mismatch for decoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 8]) from checkpoint, the shape in current model is torch.Size([32, 12]).
size mismatch for decoder.block.0.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.0.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.0.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.0.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.block.0.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for decoder.block.0.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.1.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.1.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.1.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.block.1.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for decoder.block.1.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.2.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.2.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.2.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.block.2.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for decoder.block.2.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.3.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.3.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.3.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.block.3.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for decoder.block.3.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.4.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.4.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.4.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.block.4.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for decoder.block.4.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.5.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.block.5.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.block.5.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
size mismatch for decoder.block.5.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
size mismatch for decoder.block.5.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for decoder.final_layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for lm_head.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).
Code that is resulting into above error
from transformers import RobertaTokenizer, T5ForConditionalGeneration
model_name_or_path = 'D:/CodeT5/sh/saved_models/qcbot/codet5_small_all_lr5_bs32_src256_trg128_pat2_e1/checkpoint-best-ppl' # Path to the folder created earlier.
tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-small')
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
text = "Sums two sequence of items."
input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids)
print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
@rjra2611 could you resolve the above error ??
We trained on the same data provided by salesforce again to generate checkpoint of the form bin file. Got the same error while providing the path for the file ??
Hi, I trained a model on my own task and subtask but when running the model I get the below
error 1
that config.json is missing. When I look at the model directory I can confirm that it was never generated when training the model. Is there a way I can get the config.json that is required.I tried to fetch the config.json from https://storage.googleapis.com/sfr-codet5-data-research/pretrained_models/codet5_base/config.json but I believe it's not valid for my fine tuned model. I got the below
error 2
when using this configErrror 1
Errror 2
Code that is resulting into above error