manishshettym / codescholar

codescholar: growing programs graphs idiomatically for API usage examples
10 stars 0 forks source link

CodeScholar Concept Farming: Mining Algorithm #4

Closed manishshettym closed 2 years ago

manishshettym commented 2 years ago

Given a dataset of programs P, using standard program analysis techniques and graph-based induction, CodeScholar will reduce P to a set of idiomatic and reusable code snippets S called CodeScholar APIs. Each snippet in S is representative of a code concept such as “search”, “sort”, “join”, etc. Lastly, CodeScholar will refactor these snippets into a python function that can take an arbitrary number of parameters, and return a python object.

Instead of breaking down programs into smaller and frequent fragments, CodeScholar will tackle the problem by "growing" idiomatic code fragments. It should start at single-node programs (1 stmt) and perform a greedy graph composition and pruning to farm idiomatic code patterns.

Here is a brief pseudocode for the algorithm: Image