CodeScholar will represent code snippets in a graph data-structure that explains the structure and semantics of the method. Ideally this should be some form of a graph [1] that captures data flow, control flow, and/or other semantic information.
Then, each program p will be passed through a GNN to learn program embeddings that are aware of subgraph semantics. The GNN training approach and loss function will be adapted from NeuroMatch [2].
Milestones:
[Week 1] Finalize program representation -- AST vs CST vs CFG vs DFG
[Week 1] Translate program repr to networkX format
[Week 2] Implement NeuroMatch modules
[Week 3] Training and Evaluation on pandas-idiom dataset
CodeScholar will represent code snippets in a graph data-structure that explains the structure and semantics of the method. Ideally this should be some form of a graph [1] that captures data flow, control flow, and/or other semantic information.
Then, each program p will be passed through a GNN to learn program embeddings that are aware of subgraph semantics. The GNN training approach and loss function will be adapted from NeuroMatch [2].
Milestones:
pandas-idiom
datasetExperiments:
<GNN Type>
+ NeuroMatch<node features>
+ GNN + NeuroMatchReferences: [1] https://arxiv.org/abs/2208.07461 [2] https://arxiv.org/abs/2007.03092