Closed bzz closed 5 years ago
"Neural Networks for Modeling Source Code Edits" by GoogleAI, ICLR 2019
Programming languages are emerging as a challenging and interesting domain for machine learning. A core task, which has received significant attention in recent years, is building generative models of source code. However, to our knowledge, previous generative models have always been framed in terms of generating static snapshots of code. In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files. This requires extracting intent from previous edits and leveraging it to generate subsequent edits. We develop several neural networks and use synthetic data to test their ability to learn challenging edit patterns that require strong generalization. We then collect and train our models on a large-scale dataset of Google source code, consisting of millions of fine-grained edits from thousands of Python developers. From the modeling perspective, our main conclusion is that a new composition of attentional and pointer network components provides the best overall performance and scalability. From the application perspective, our results provide preliminary evidence of the feasibility of developing tools that learn to predict future edits.
"Graph Matching Networks for Learning the Similarity of Graph Structured Objects" by DeepMind/Google at ICML 2019
This paper addresses the challenging problem of retrieval and matching of graph structured objects, and makes two key contributions. First, we demonstrate how Graph Neural Networks (GNN), which have emerged as an effective model for various supervised prediction problems defined on structured data, can be trained to produce embedding of graphs in vector spaces that enables efficient similarity reasoning. Second, we propose a novel Graph Matching Network model that, given a pair of graphs as input, computes a similarity score between them by jointly reasoning on the pair through a new cross-graph attention-based matching mechanism. We demonstrate the effectiveness of our models on different domains including the challenging problem of control-flow-graph based function similarity search that plays an important role in the detection of vulnerabilities in software systems. The experimental analysis demonstrates that our models are not only able to exploit structure in the context of similarity learning but they can also outperform domain-specific baseline systems that have been carefully hand-engineered for these problems.
Another source code edits modeling paper from ICLR:
"Learning to Represent Edits" an MSR paper from ICLR'19:
We introduce the problem of learning distributed representations of edits. By com- bining a “neural editor” with an “edit encoder”, our models learn to represent the salient information of an edit and can be used to apply edits to new inputs. We experiment on natural language and source code edit data. Our evaluation yields promising results that suggest that our neural network models learn to capture the structure and semantics of edits. We hope that this interesting task and data source will inspire other researchers to work further on this problem.
two of them are identical (on source code edits)
two of them are identical (on source code edits)
Thank you @fyzbt - good catch!
Ofc it was not supposed to be that way - there are 2 recent papers on learning sources code edits both presented at ICLR, that were meant to be listed! Update the https://github.com/src-d/reading-club/issues/53#issuecomment-494377071 to include MSR one.
Calling for a vote in community Slack #reading-club.
The poll is up. And https://github.com/src-d/reading-club/issues/61 is for planing further sessions.
Fixed by #62 - the winners are 2 source code edits papers.
Next paper candidates
Let's propose papers to study next! All papers mentioned in the comments of this issue will be listed in the next vote.
Last session runner-up(s)