salesforce / CodeTF

CodeTF: One-stop Transformer Library for State-of-the-art Code LLM
Apache License 2.0
1.45k stars 100 forks source link

`defect` and `refine` tasks for `codet5p-16b` #3

Open cyw3 opened 1 year ago

cyw3 commented 1 year ago

Hi Nice project!

Do you have the checkpoint and evaluation conclusions of the defect and refine tasks for codet5p-16b?

Here I tested the codet5 defect and refine task examples in codetf, and found that the effect is not very good.

bdqnghi commented 1 year ago

hi, thanks for the question. May i know what do you mean by the "effect"? Note that we haven't concluded the project and haven't officially released yet, there are still things to improve.

cyw3 commented 1 year ago
code_snippets = """
#include <stdio.h>
int main()
{
    int *pointer_var;
    printf("Address of the given pointer variable: %d", pointer_var);
    printf("Value of pointer variable is : %d", * pointer_var);
    return 0;
}
"""
defect_model = load_model_pipeline(model_name="codet5", task="defect",
            model_type="base", is_eval=True,
            load_in_8bit=True, weight_sharding=False)
defects = defect_model.predict([code_snippets])
print(defects)

code_snippets: a c/c++ demo about null pointer error. But Salesforce/codet5-base-codexglue-defect thinks it's not a defect, and output ['false'].


bdqnghi commented 1 year ago

i see, thanks for this, the CodeT5-defect is fine-tuned on this data from Microsoft CodeXGLUE: https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection. In fact, this dataset is mostly used for academic research, apply the results into real-world application can produce unsatisfied result for many cases.

You can check more on the results in CodeT5 paper: https://aclanthology.org/2021.emnlp-main.685.pdf.

cyw3 commented 1 year ago

Thanks.

Do you have more datasets about defect and refine to recommend?