CodeScholar's Subgraph Representation - Githubissues

manishshettym / codescholar

codescholar: growing programs graphs idiomatically for API usage examples

10 stars 0 forks source link

CodeScholar's Subgraph Representation #11

Closed manishshettym closed 1 year ago

manishshettym commented 2 years ago

CodeScholar will represent code snippets in a graph data-structure that explains the structure and semantics of the method. Ideally this should be some form of a graph [1] that captures data flow, control flow, and/or other semantic information.

Then, each program p will be passed through a GNN to learn program embeddings that are aware of subgraph semantics. The GNN training approach and loss function will be adapted from NeuroMatch [2].

Milestones:

[Week 1] Finalize program representation -- AST vs CST vs CFG vs DFG
[Week 1] Translate program repr to networkX format
[Week 2] Implement NeuroMatch modules
[Week 3] Training and Evaluation on pandas-idiom dataset

Experiments:

<GNN Type> + NeuroMatch
<node features> + GNN + NeuroMatch

References: [1] https://arxiv.org/abs/2208.07461 [2] https://arxiv.org/abs/2007.03092