salesforce / CodeT5

Home of CodeT5: Open Code LLMs for Code Understanding and Generation
https://arxiv.org/abs/2305.07922
BSD 3-Clause "New" or "Revised" License
2.71k stars 396 forks source link

About AI coding assistant demo #11

Closed skye95git closed 2 years ago

skye95git commented 2 years ago

Hi, the newly added AI coding assistant demo is cool! I have a few questions about it:

  1. Did you make the codeT5 model into VS Code plugin? How did you do that?

  2. When demonstrating code generation, editing the comment generates the corresponding code snippet. Aren't the inputs to the code generation model natural Language Description and class Environment?

    捕获

The format of the data in the Concode dataset is

{
    "code": "int function ( double [ ] arg0 , double [ ] arg1 ) { int loc0 = arg0 . length - arg1 . length ; outer : for ( int loc1 = 0 ; loc1 <= loc0 ; loc1 ++ ) { for ( int loc2 = 0 ; loc2 < arg1 . length ; loc2 ++ ) { if ( ne ( arg0 [ loc1 + loc2 ] , arg1 [ loc2 ] ) ) { continue outer ; } } return ( loc1 ) ; } return ( - 1 ) ; }",
    "nl": "searches for the first subsequence of a that matches sub elementwise . elements of sub are considered to match elements of a if they pass the #eq test . concode_field_sep double max_ratio concode_elem_sep double min_ratio concode_elem_sep boolean off concode_field_sep boolean isElemMatch concode_elem_sep int compare concode_elem_sep boolean isSubset concode_elem_sep boolean ne concode_elem_sep boolean lt concode_elem_sep boolean gte concode_elem_sep void set_rel_diff concode_elem_sep boolean eq concode_elem_sep boolean lte concode_elem_sep boolean gt"
}

Is it possible to generate accurate code snippet by typing only comments without class Environment? Doesn't the loss of context information affect the quality of the generated code?

yuewang-cuhk commented 2 years ago

Hey there,

Yes, I make a CodeT5 demo as a VS code plugin. We first run the model on the GCP and expose it as a web service, which is then integrated into the VS Code extension. Please refer to this online tutorial for more details.

For the demo of code generation, the source input is only the natural language description. We fine-tune another CodeT5 model for this on our customized Apex dataset, which is not the Concode data and has no env variables.

skye95git commented 2 years ago

Hey there,

Yes, I make a CodeT5 demo as a VS code plugin. We first run the model on the GCP and expose it as a web service, which is then integrated into the VS Code extension. Please refer to this online tutorial for more details.

For the demo of code generation, the source input is only the natural language description. We fine-tune another CodeT5 model for this on our customized Apex dataset, which is not the Concode data and has no env variables.

Thanks for your reply! Is your customized Apex dataset public? If not, is it convenient to give an example of the data format?

skye95git commented 2 years ago

@yuewang-cuhk Hi, the output of the model appears to be token level:

void function ( ) { TSTNode loc0 = root ; while ( loc0 != null ) { if ( loc0 . data . equals ( data ) ) return ; loc0 = loc0 . left ; } }

How do you try out the full readable output in the AI coding assistant demo? Is it also implemented with your customized Apex dataset?

捕获