salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.71k stars 396 forks source link

Inference for java code summarization #10

Closed lyriccoder closed 2 years ago

lyriccoder commented 2 years ago

Is it possible to make code summarization for raw Java code?

I can't find the example of inference for code summarization. Could you please provide an example? E.g., I expect the following code:

from transformers import RobertaTokenizer,  WHICH_MODELTO_USE

tokenizer = RobertaTokenizer.from_pretrained('Salesforce/codet5-base')
model = WHICH_MODELTO_USE.from_pretrained('Salesforce/codet5-base')

java_code = 'int i = 0; ++i;  int b = runSomeFunction(i); extract(b);'
code_summarization = model.predict(java_code)
print(code_summarization)

The expected result is the following: 'Extracts and returns max value'

Is it possible to make such the prediction? The problem is I can't understand how you are translating from code to the vector which will be used to predict the summarization without pretraining procedures.

Could you please provide an example?

mosh98 commented 2 years ago

Hi, I modified a small package to work with CodeT5, you can try it out here : https://github.com/mosh98/simpleT5

If you want to look at the prediction example, you can also find the snippet here: https://github.com/mosh98/simpleT5/blob/40bd043ab9d83122db2c55385f469c00b23f2aff/simplet5/simplet5.py#L410

Hope it helps

lyriccoder commented 2 years ago

Thank you for your work. Is there a pre-trained model for Java code summarization? Unfortunately, my network doesn't allow to download anything inside python (from_pretrained is failed due to networks settings).

Could you please provide a pre-trained fine-tuned model for Java code summarization (on google drive or smth else)? I tried to run fine-tuning, but it told me that it is not even enough 30 GB of video memory (it is necessary to have 70 GB)

mosh98 commented 2 years ago

I'll try to fine tune it over the weekend, I'll come back to ya ;)

yuewang-cuhk commented 2 years ago

Hi there, please refer to our newly released multi-lingual CodeT5-base model (codet5-base-multi-sum) fine-tuned for code summarization, which also achieves SOTA performance for Java.