salesforce / CodeGen

CodeGen is a family of open-source model for program synthesis. Trained on TPU-v4. Competitive with OpenAI Codex.
Apache License 2.0
4.95k stars 381 forks source link

BigQuery dataset #35

Open ValeKnappich opened 2 years ago

ValeKnappich commented 2 years ago

Hi, first of all, great work!

Is there any chance you could provide more details on the BigQuery dataset / subset? Perhaps a list of the repositories used? It would be great to have in order to avoid data leakage in experiments.

Cheers

islammesabah commented 1 year ago

Hi, Could you please provide this information? Or at least the time frame for the collected dataset.