Closed skye95git closed 2 years ago
Hi @skye95git, I could not reproduce your issue. You can check whether the model is a T5ForConditionalGeneration
object or directly print the outputs. Also, make sure you download the correct version of transformers
(>= 4.6.1).
Hi @skye95git, I could not reproduce your issue. You can check whether the model is a
T5ForConditionalGeneration
object or directly print the outputs. Also, make sure you download the correct version oftransformers
(>= 4.6.1).
Thanks for your reply! After I update the transformers
, it worked.
I want to pre-train a model using my data. Do you plan to share the pre-training code?
Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?
Hi @skye95git, I could not reproduce your issue. You can check whether the model is a
T5ForConditionalGeneration
object or directly print the outputs. Also, make sure you download the correct version oftransformers
(>= 4.6.1).Thanks for your reply! After I update the
transformers
, it worked. I want to pre-train a model using my data. Do you plan to share the pre-training code?
We currently do not have a plan to release the pre-training code, which should not be difficult to implement based on the paper. We are also happy to take questions regarding its implementation.
Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?
You can refer to the following function for this: https://github.com/salesforce/CodeT5/blob/100c7e55c1e120600a2e114e828a157d265a2411/run_gen.py#L84
Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?
You can refer to the following function for this: https://github.com/salesforce/CodeT5/blob/100c7e55c1e120600a2e114e828a157d265a2411/run_gen.py#L84
Thanks for your reply! When I fine-tuning the model, I meet an error:
There are many similar absolute paths in the repository. For example,
In models.py
In calc_code_bleu.py
In dataflow_match.py
In syntax_match.py
It would be nice to be reminded in the readme that alternate paths are needed.
Hi, I have finished fine-tune. The result in sh/results
is:
Are the results in sh/results
evaluated on concode's test set or dev set? If it is evaluated on concode's dev set, how to evaluate on concode's test set?
I read the source code in run_gen.py
. I find the result in sh/results
is evaluated on concode's test set.
I want to see the prediction result. Are the generated results of the test dataset stored in the sh/saved_models/concode/codet5_base_all_lr10_bs32_src320_trg150_pat3_e30/prediction
?
What do test_*.gold
, test_*.output
and test_*.src
in the folder stand for respectively?
Is input data stored in test_*.src
? Is output data stored in test_*.output
?
Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?
You can refer to the following function for this: https://github.com/salesforce/CodeT5/blob/100c7e55c1e120600a2e114e828a157d265a2411/run_gen.py#L84
Hi, the paper describes we additionally collect two datasets of C/CSharp from BigQuery
. How do you parse C and CSharp downloaded from Bigquery to extract functions? I also want to parse the source code I've acquired and retrain the model. Is it convenient for you to share the parsed code?
I read the source code in
run_gen.py
. I find the result insh/results
is evaluated on concode's test set.I want to see the prediction result. Are the generated results of the test dataset stored in the
sh/saved_models/concode/codet5_base_all_lr10_bs32_src320_trg150_pat3_e30/prediction
?What do
test_*.gold
,test_*.output
andtest_*.src
in the folder stand for respectively?Is input data stored in
test_*.src
? Is output data stored intest_*.output
?
Hi @skye95git, yes. Your understanding is correct. The test_*.src
is the source input, test_*.output
is the model output, and test_*.gold
is the ground-truth target output.
Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?
You can refer to the following function for this: https://github.com/salesforce/CodeT5/blob/100c7e55c1e120600a2e114e828a157d265a2411/run_gen.py#L84
Hi, the paper describes
we additionally collect two datasets of C/CSharp from BigQuery
. How do you parse C and CSharp downloaded from Bigquery to extract functions? I also want to parse the source code I've acquired and retrain the model. Is it convenient for you to share the parsed code?
We parse it using the tree-sitter similar to the CodeSearchNet dataset. We will release this additional data (C/C#) soon.
Hi, I want to experience the generated code of Codet5. How to use the model to generate code after fine-tuning?
You can refer to the following function for this: https://github.com/salesforce/CodeT5/blob/100c7e55c1e120600a2e114e828a157d265a2411/run_gen.py#L84
Hi, the paper describes
we additionally collect two datasets of C/CSharp from BigQuery
. How do you parse C and CSharp downloaded from Bigquery to extract functions? I also want to parse the source code I've acquired and retrain the model. Is it convenient for you to share the parsed code?We parse it using the tree-sitter similar to the CodeSearchNet dataset. We will release this additional data (C/C#) soon.
That's cool. In addition to the additional data (C/C#) you will release, I want to parse the source code of C and C# that we obtained. Is it convenient for you to share the parsed code for C and C#?
There is a fork of the awesome function_parser library from Github's CodeSearchNet Challenge repo. Currently, it supports 6 languages: Python, Java, Go, Php, Ruby, and Javascript. But it doesn't support C and C#. I tried to use the tree-sitter similar to the CodeSearchNet dataset to parse C and C#. Unfortunately, the effect isn't satisfactory.
I would be grateful if you could share share the C and C# parse codes, I plan to fork it and update the function_parser. It can help more people.
Hi, what is the difference between concode_field_sep
and concode_elem_sep
in the NL field in the Concode dataset? Which one represents the variable? which one represents the function?
The description about CONCODE in CodeXGLUE
is nl combines natural language description and class environment. Elements in class environment are seperated by special tokens like con_elem_sep and con_func_sep
.
If concode_elem_sep
refers to con_elem_sep
, it represents the variable. The content in dev.json
seems different.
The Line 269 in dev.json
:
{
"code": "int function ( double [ ] arg0 , double [ ] arg1 ) { int loc0 = arg0 . length - arg1 . length ; outer : for ( int loc1 = 0 ; loc1 <= loc0 ; loc1 ++ ) { for ( int loc2 = 0 ; loc2 < arg1 . length ; loc2 ++ ) { if ( ne ( arg0 [ loc1 + loc2 ] , arg1 [ loc2 ] ) ) { continue outer ; } } return ( loc1 ) ; } return ( - 1 ) ; }",
"nl": "searches for the first subsequence of a that matches sub elementwise . elements of sub are considered to match elements of a if they pass the #eq test . concode_field_sep double max_ratio concode_elem_sep double min_ratio concode_elem_sep boolean off concode_field_sep boolean isElemMatch concode_elem_sep int compare concode_elem_sep boolean isSubset concode_elem_sep boolean ne concode_elem_sep boolean lt concode_elem_sep boolean gte concode_elem_sep void set_rel_diff concode_elem_sep boolean eq concode_elem_sep boolean lte concode_elem_sep boolean gt"
}
The env
in nl
is
concode_field_sep double max_ratio
concode_elem_sep double min_ratio
concode_elem_sep boolean off
concode_field_sep boolean isElemMatch
concode_elem_sep int compare
concode_elem_sep boolean isSubset
concode_elem_sep boolean ne
concode_elem_sep boolean lt
concode_elem_sep boolean gte
concode_elem_sep void set_rel_diff
concode_elem_sep boolean eq
concode_elem_sep boolean lte
concode_elem_sep boolean gt
The code
is
int function(double[] arg0, double[] arg1) {
int loc0 = arg0 . length - arg1 . length
outer: for (int loc1=0 loc1 <= loc0 loc1 + +) {
for (int loc2=0 loc2 < arg1 . length loc2 + +) {
if (ne(arg0[loc1 + loc2], arg1[loc2])) {
continue outer
}
}
return (loc1)
}
return (- 1)
}
The ne()
in the code
field is a function, not a variable. But But it's concode_elem_sep
in env. So I'm a little confused.
Hi @skye95git, I could not reproduce your issue. You can check whether the model is a
T5ForConditionalGeneration
object or directly print the outputs. Also, make sure you download the correct version oftransformers
(>= 4.6.1).Thanks for your reply! After I update the
transformers
, it worked. I want to pre-train a model using my data. Do you plan to share the pre-training code?We currently do not have a plan to release the pre-training code, which should not be difficult to implement based on the paper. We are also happy to take questions regarding its implementation.
Hi, I try to implement pre-training code. I have a couple of questions about the pre-training data:
The paper describes you employ CodeSearchNet as pre-training data. Do I need to preprocess CodeSearchNet before pretraining? If necessary, how should it be preprocessed?
The statistical data in table 1 are different from CodeSearchNet to some extent. In CodeT5 table1:
In CodeSearchNet table1:
CodeT5 uses less pre-training data than the original Codesearchnet data. Did you do data cleansing before pre-training?
The data set used to fine-tune the code generation task is concode, which contains only the Java corpus. So can CodeT5 only generate Java code, or can all eight of the code used for pre-training be generated? If so, does that mean CodeT5 can generate code directly without fine-tuning it?
Hi, I want to run CodeT5-base on code generation task. I run the command:
python run_exp.py --model_tag codet5_base --task concode --sub_task none
There is an error:
'tuple' object has no attribute 'loss'
.I try to change
outputs = model(input_ids=source_ids, attention_mask=source_mask, labels=target_ids, decoder_attention_mask=target_mask)
tooutputs, _ = model(input_ids=source_ids, attention_mask=source_mask, labels=target_ids, decoder_attention_mask=target_mask)
There is an error:
too many values to unpack (expected 2)
What should I do?