Given a dataset of programs P, using standard program analysis techniques and graph-based induction, CodeScholar will reduce P to a set of idiomatic and reusable code snippets S called CodeScholar APIs. Each snippet in S is representative of a code concept such as “search”, “sort”, “join”, etc. Lastly, CodeScholar will refactor these snippets into a python function that can take an arbitrary number of parameters, and return a python object.
Instead of breaking down programs into smaller and frequent fragments, CodeScholar will tackle the problem by "growing" idiomatic code fragments. It should start at single-node programs (1 stmt) and perform a greedy graph composition and pruning to farm idiomatic code patterns.
Given a dataset of programs P, using standard program analysis techniques and graph-based induction, CodeScholar will reduce P to a set of idiomatic and reusable code snippets S called CodeScholar APIs. Each snippet in S is representative of a code concept such as “search”, “sort”, “join”, etc. Lastly, CodeScholar will refactor these snippets into a python function that can take an arbitrary number of parameters, and return a python object.
Instead of breaking down programs into smaller and frequent fragments, CodeScholar will tackle the problem by "growing" idiomatic code fragments. It should start at single-node programs (1 stmt) and perform a greedy graph composition and pruning to farm idiomatic code patterns.
Here is a brief pseudocode for the algorithm: