manishshettym / codescholar

codescholar: growing programs graphs idiomatically for API usage examples
10 stars 0 forks source link

CodeScholar's Subgraph Representation #11

Closed manishshettym closed 1 year ago

manishshettym commented 2 years ago

CodeScholar will represent code snippets in a graph data-structure that explains the structure and semantics of the method. Ideally this should be some form of a graph [1] that captures data flow, control flow, and/or other semantic information.

Then, each program p will be passed through a GNN to learn program embeddings that are aware of subgraph semantics. The GNN training approach and loss function will be adapted from NeuroMatch [2].

Milestones:

  1. [Week 1] Finalize program representation -- AST vs CST vs CFG vs DFG
  2. [Week 1] Translate program repr to networkX format
  3. [Week 2] Implement NeuroMatch modules
  4. [Week 3] Training and Evaluation on pandas-idiom dataset

Experiments:

  1. <GNN Type> + NeuroMatch
  2. <node features> + GNN + NeuroMatch

References: [1] https://arxiv.org/abs/2208.07461 [2] https://arxiv.org/abs/2007.03092