Closed BakingBrains closed 2 years ago
Hi, the data reading functions surely can be customized according to your needs. If you want to fine-tune on Concode code generation task, you employ the former read_concode_examples
. Or if you want to reverse the CodeSearchNet summarization into a text-to-code generation task, you can modify the read_summarize_examples
.
For the maximum source/target lengths, these are usually determined by the tokenized lengths of your (source, target) pairs and the limitation of GPU memory at some cases. You can tune these hyper-parameters as well.
For code generation task, should I use the data reading method used for concode.
OR Data reading method used for code summarization (here replacing source with the
docstring_tokens
and target withcode_tokens
.Any suggestions?
Also, do I need to change the
args.max_source_length = 256
andargs.max_target_length = 128
for code generation task?