salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.65k stars 391 forks source link

Error when loading embedding pre-trained model #143

Open pdhung3012 opened 10 months ago

pdhung3012 commented 10 months ago

Hello. I tried this simple code snippet for getting the embedding for a pre-trained model using CodeT5Plus:

`checkpoint="/home/hungphd/media/git/codet5p-110m-embedding" device = "cuda" # for GPU usage or "cpu" for CPU usage

tokenizer = AutoTokenizer.from_pretrained(checkpoint, trust_remote_code=True) model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True).to(device)

inputs = tokenizer.encode("def print_hello_world():\tprint('Hello World!')", return_tensors="pt").to(device) embedding = model(inputs)[0] print(f'Dimension of the embedding: {embedding.size()[0]}, with norm={embedding.norm().item()}') `

However, I got this error: Traceback (most recent call last): File "/home/hungphd/media/git/CodeT5/CodeT5+/code_retrieval/examplePretrainedModel.py", line 8, in <module> model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True).to(device) File "/home/hungphd/anaconda3/envs/py38v2/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 396, in from_pretrained config, kwargs = AutoConfig.from_pretrained( File "/home/hungphd/anaconda3/envs/py38v2/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 529, in from_pretrained config_class = CONFIG_MAPPING[config_dict["model_type"]] File "/home/hungphd/anaconda3/envs/py38v2/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 278, in __getitem__ raise KeyError(key) KeyError: 'codet5p_embedding' I downloaded the pre-trained model from hugging faces. The folder of pre-trained model looks like this:

https://drive.google.com/file/d/1CdLv5GyNFeIPPufLcUS-5TT53W_4fes4/view?usp=drive_link

Did I do it correctly?

pdhung3012 commented 10 months ago

Hello. After checking the error, I investigate that the reason is due to the codet5p_embedding hasn't been a key defined in this file: https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py

I check in the latest version on Github they didn't define that key. Is that key specific to your machine (or did you update the configuration_auto.py file specifically compared to the general version)?

yuewang-cuhk commented 10 months ago

Hello, are you able to run the example script here? I've double check it and can run it successfully.

I guess the reason for the above error you faced is due to that you load the model from a local folder instead of from the remote HuggingFace hub. To load the local model checkpoint, you'll have to download the class and config files. Below is the example code to load the local model:

from modeling_codet5p_embedding import CodeT5pEmbeddingModel
model = CodeT5pEmbeddingModel.from_pretrained(checkpoint)
pdhung3012 commented 10 months ago

Thank you for your help. Yes, at first I ran the script on my local ubuntu 22.04 server (which the code called to the remoted HuggingFace hub "Salesforce/codet5p-110m-embedding") but it returned the same error. Thus, I changed to download the model to my local machine to call but the error still remained. Let me try again.

pdhung3012 commented 10 months ago

I tried to load it remotely from huggingface remote side. It shows the error like this:

` OSError: Can't load 'Salesforce/codet5p-110m-embedding'. Make sure that:

pdhung3012 commented 10 months ago

I checked the version of my transformer. I used the old version of transformers as 4.11.3, while the current version is 4.32.1. reinstall my transformer and it worked. Thanks for your help.