Closed pocca2048 closed 2 years ago
Hi @pocca2048, thanks for the suggestions! We have uploaded CodeT5 to Hugging Face so that you can load our model and tokenizer using:
from transformers import RobertaTokenizer, T5ForConditionalGeneration
tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-small')
model = T5ForConditionalGeneration.from_pretrained('Salesforce/codet5-small')
Hi, thanks for sharing your great work!
Following the link from huggingface transformers documentation, I think it would be better to save tokenizer by
tokenizer.save
rather thantokenizer.save_model
.That is, https://github.com/salesforce/CodeT5/blob/466b8607fd08bc4bd8847cc6590c801a9c21db23/tokenizer/train_tokenizer.py#L18 change this to
tokenizer.save("tokenizer.json")
Then, you can use transformers
transformers.PretrainedTokenizerFast
rather thantokenizers.Tokenizer
at https://github.com/salesforce/CodeT5/blob/208acbd759fd8014374387b272647ef7ab4b85e3/tokenizer/apply_tokenizer.py#L3-L6 like this: