`Config.json` missing error when launching custom trained model

Hi, I trained a model on my own task and subtask but when running the model I get the below error 1 that config.json is missing. When I look at the model directory I can confirm that it was never generated when training the model. Is there a way I can get the config.json that is required.

I tried to fetch the config.json from https://storage.googleapis.com/sfr-codet5-data-research/pretrained_models/codet5_base/config.json but I believe it's not valid for my fine tuned model. I got the below error 2 when using this config

Errror 1

(codet5) D:\CodeT5\sh>python test-results.py
Traceback (most recent call last):
File "test-results.py", line 6, in <module>
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\modeling_utils.py", line 1067, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\configuration_utils.py", line 427, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\configuration_utils.py", line 484, in get_config_dict
resolved_config_file = cached_path(
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\file_utils.py", line 1289, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse D:\CodeT5\sh\saved_models\qcbot\codet5_small_all_lr5_bs32_src256_trg128_pat2_e1\checkpoint-best-ppl\config.json as a URL or as a local path

Errror 2

(codet5) D:\CodeT5\sh>python test-results.py
Some weights of T5ForConditionalGeneration were not initialized from the model checkpoint at D:/CodeT5/sh/saved_models/qcbot/codet5_small_all_lr5_bs32_src256_trg128_pat2_e1/checkpoint-best-ppl and are newly initialized: ['encoder.block.10.layer.0.SelfAttention.k.weight', 'decoder.block.11.layer.1.EncDecAttention.v.weight', 'encoder.block.8.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.1.DenseReluDense.wo.weight', 'decoder.block.7.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.7.layer.1.EncDecAttention.o.weight', 'decoder.block.8.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.2.DenseReluDense.wi.weight', 'decoder.block.10.layer.2.layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.o.weight', 'encoder.block.6.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.q.weight', 'encoder.block.7.layer.0.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.v.weight', 'decoder.block.6.layer.1.EncDecAttention.o.weight', 'encoder.block.6.layer.1.DenseReluDense.wi.weight', 'decoder.block.6.layer.2.layer_norm.weight', 'decoder.block.10.layer.1.layer_norm.weight', 'decoder.block.7.layer.1.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.q.weight', 'encoder.block.10.layer.1.layer_norm.weight', 'encoder.block.7.layer.1.DenseReluDense.wi.weight', 'decoder.block.8.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.1.EncDecAttention.q.weight', 'decoder.block.7.layer.1.EncDecAttention.q.weight', 'encoder.block.6.layer.0.SelfAttention.q.weight', 'decoder.block.10.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.1.layer_norm.weight', 'encoder.block.6.layer.0.layer_norm.weight', 'encoder.block.8.layer.1.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.DenseReluDense.wi.weight', 'encoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.layer_norm.weight', 'decoder.block.6.layer.2.DenseReluDense.wi.weight', 'decoder.block.11.layer.2.DenseReluDense.wi.weight', 'encoder.block.11.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.v.weight', 'encoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.2.DenseReluDense.wo.weight', 'decoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.6.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.1.EncDecAttention.o.weight', 'encoder.block.10.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.0.SelfAttention.o.weight', 'encoder.block.9.layer.1.layer_norm.weight', 'decoder.block.8.layer.1.EncDecAttention.o.weight', 'encoder.block.10.layer.0.layer_norm.weight', 'decoder.block.7.layer.1.EncDecAttention.k.weight', 'decoder.block.8.layer.2.layer_norm.weight', 'encoder.block.8.layer.1.DenseReluDense.wi.weight', 'decoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.0.SelfAttention.q.weight', 'decoder.block.6.layer.1.EncDecAttention.k.weight', 'decoder.block.7.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.1.EncDecAttention.v.weight', 'decoder.block.7.layer.2.DenseReluDense.wi.weight', 'decoder.block.9.layer.2.layer_norm.weight', 'encoder.block.11.layer.0.SelfAttention.o.weight', 'encoder.block.6.layer.1.DenseReluDense.wo.weight', 'encoder.block.9.layer.1.DenseReluDense.wo.weight', 'decoder.block.6.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.0.SelfAttention.k.weight', 'encoder.block.11.layer.1.DenseReluDense.wo.weight', 'decoder.block.9.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.1.EncDecAttention.k.weight', 'encoder.block.10.layer.1.DenseReluDense.wo.weight', 'decoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.layer_norm.weight', 'decoder.block.6.layer.0.SelfAttention.k.weight', 'decoder.block.7.layer.1.EncDecAttention.v.weight', 'encoder.block.7.layer.0.SelfAttention.v.weight', 'decoder.block.9.layer.1.layer_norm.weight', 'decoder.block.6.layer.0.layer_norm.weight', 'encoder.block.11.layer.0.SelfAttention.v.weight', 'encoder.block.11.layer.1.layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.0.SelfAttention.v.weight', 'decoder.block.8.layer.2.DenseReluDense.wo.weight', 'decoder.block.10.layer.2.DenseReluDense.wo.weight', 'encoder.block.11.layer.0.SelfAttention.k.weight', 'decoder.block.10.layer.1.EncDecAttention.k.weight', 'decoder.block.11.layer.0.SelfAttention.v.weight', 'decoder.block.7.layer.0.SelfAttention.q.weight', 'decoder.block.8.layer.0.layer_norm.weight', 'encoder.block.8.layer.1.layer_norm.weight', 'decoder.block.8.layer.0.SelfAttention.k.weight', 'decoder.block.9.layer.1.EncDecAttention.q.weight', 'encoder.block.7.layer.0.SelfAttention.q.weight', 'encoder.block.9.layer.0.SelfAttention.o.weight', 'decoder.block.10.layer.1.EncDecAttention.q.weight', 'encoder.block.10.layer.1.DenseReluDense.wi.weight', 'decoder.block.9.layer.0.SelfAttention.k.weight', 'encoder.block.10.layer.0.SelfAttention.v.weight', 'encoder.block.9.layer.0.layer_norm.weight', 'encoder.block.9.layer.1.DenseReluDense.wi.weight', 'decoder.block.9.layer.1.EncDecAttention.v.weight', 'decoder.block.11.layer.0.layer_norm.weight', 'decoder.block.11.layer.1.EncDecAttention.q.weight', 'encoder.block.9.layer.0.SelfAttention.q.weight', 'encoder.block.8.layer.0.layer_norm.weight', 'decoder.block.10.layer.1.EncDecAttention.o.weight', 'decoder.block.7.layer.2.layer_norm.weight', 'decoder.block.11.layer.1.layer_norm.weight', 'encoder.block.6.layer.0.SelfAttention.k.weight', 'encoder.block.7.layer.0.SelfAttention.k.weight', 'encoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.9.layer.0.SelfAttention.q.weight', 'decoder.block.11.layer.2.layer_norm.weight', 'decoder.block.10.layer.0.SelfAttention.o.weight', 'decoder.block.11.layer.2.DenseReluDense.wo.weight', 'decoder.block.8.layer.1.layer_norm.weight', 'decoder.block.10.layer.0.layer_norm.weight', 'encoder.block.9.layer.0.SelfAttention.k.weight', 'encoder.block.8.layer.0.SelfAttention.v.weight', 'decoder.block.11.layer.1.EncDecAttention.o.weight', 'decoder.block.9.layer.0.layer_norm.weight', 'encoder.block.11.layer.1.DenseReluDense.wi.weight', 'encoder.block.7.layer.0.SelfAttention.o.weight', 'decoder.block.6.layer.0.SelfAttention.o.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "test-results.py", line 6, in <module>
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\modeling_utils.py", line 1213, in from_pretrained
model, missing_keys, unexpected_keys, error_msgs = cls._load_state_dict_into_model(
File "C:\Users\USER\anaconda3\envs\codet5\lib\site-packages\transformers\modeling_utils.py", line 1354, in _load_state_dict_into_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for T5ForConditionalGeneration:
    size mismatch for shared.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).
    size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).
    size mismatch for encoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 8]) from checkpoint, the shape in current model is torch.Size([32, 12]).
    size mismatch for encoder.block.0.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.0.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for encoder.block.0.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for encoder.block.0.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.1.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.1.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for encoder.block.1.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for encoder.block.1.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.2.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.2.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for encoder.block.2.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for encoder.block.2.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.3.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.3.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for encoder.block.3.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for encoder.block.3.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.4.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.4.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for encoder.block.4.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for encoder.block.4.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for encoder.block.5.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.block.5.layer.1.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for encoder.block.5.layer.1.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for encoder.block.5.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for encoder.final_layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).
    size mismatch for decoder.block.0.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight: copying a param with shape torch.Size([32, 8]) from checkpoint, the shape in current model is torch.Size([32, 12]).
    size mismatch for decoder.block.0.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.0.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.0.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.0.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for decoder.block.0.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for decoder.block.0.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.1.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.1.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.1.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.1.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for decoder.block.1.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for decoder.block.1.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.2.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.2.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.2.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.2.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for decoder.block.2.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for decoder.block.2.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.3.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.3.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.3.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.3.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for decoder.block.3.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for decoder.block.3.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.4.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.4.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.4.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.4.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for decoder.block.4.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for decoder.block.4.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.5.layer.0.SelfAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.0.SelfAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.0.SelfAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.0.SelfAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.0.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.5.layer.1.EncDecAttention.q.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.1.EncDecAttention.k.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.1.EncDecAttention.v.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.1.EncDecAttention.o.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([768, 768]).
    size mismatch for decoder.block.5.layer.1.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.block.5.layer.2.DenseReluDense.wi.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
    size mismatch for decoder.block.5.layer.2.DenseReluDense.wo.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
    size mismatch for decoder.block.5.layer.2.layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for decoder.final_layer_norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([768]).
    size mismatch for lm_head.weight: copying a param with shape torch.Size([32100, 512]) from checkpoint, the shape in current model is torch.Size([32100, 768]).

Code that is resulting into above error

from transformers import RobertaTokenizer, T5ForConditionalGeneration

model_name_or_path = 'D:/CodeT5/sh/saved_models/qcbot/codet5_small_all_lr5_bs32_src256_trg128_pat2_e1/checkpoint-best-ppl'  #  Path to the folder created earlier.

tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-small')
model = T5ForConditionalGeneration.from_pretrained(model_name_or_path)

text = "Sums two sequence of items."

input_ids = tokenizer(text, return_tensors="pt").input_ids
generated_ids = model.generate(input_ids)

print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))

salesforce / CodeT5

`Config.json` missing error when launching custom trained model #87