Closed Debdeep1998 closed 1 year ago
Hi there, we cannot gaurantee the generated code is compilable for in a good format as we directly use the code files without normalization or refactoring for pretraining. You might consider to include another post-processing step to reformat the generated code from our models.
Hi thanks, can you direct us to necessary post processing steps that we might need to adopt?
I've finetuned CodeT5 large on a small python dataset(~1700) data points. I see that the results are more or less correct but the code is not always compilable(due to inconsistent spacing and new line characters). Any idea on fixing this? And how CodeBLEU work if the code generated by the model isn't compilable? The model might generate non compilable code during initial phases of the training right?