shailja-thakur / CodeGen-Fine-Tuning

Apache License 2.0
32 stars 6 forks source link

Questions about dataset #6

Open pierowu opened 8 months ago

pierowu commented 8 months ago

According to the paper, there a 2 datasets for finetuning which come from Github and books seperately. But it seems that only Github Corpus is released under your huggingface account. Could you share another one?

Another question is that I' ve noticed that there is another dataset in your huggingface account called CodeGen_RE_data. Could you explain the source of this dataset?

Thanks!

martinwz commented 1 month ago

Could you please share the url of Github Corpus found here? I also need it to build my model. Thank you.