m09 commented 5 years ago

Next paper candidates

Let's propose papers to study next! All papers mentioned in the comments of this issue will be listed in the next vote.

m09 commented 5 years ago

Last week runner-up: Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However, programs do not start off in a form that is immediately amenable to most off-the-shelf learning techniques. Instead, it is necessary to transform the program to a suitable representation before a learning technique can be applied. In this paper, we use abstractions of traces obtained from symbolic execution of a program as a representation for learning word embeddings. We trained a variety of word embeddings under hundreds of parameterizations, and evaluated each learned embedding on a suite of different tasks. In our evaluation, we obtain 93% top-1 accuracy on a benchmark consisting of over 19,000 API-usage analogies extracted from the Linux kernel. In addition, we show that embeddings learned from (mainly) semantic abstractions provide nearly triple the accuracy of those learned from (mainly) syntactic abstractions.

shang-vikas commented 5 years ago

How about world models - https://arxiv.org/abs/1803.10122

patterns-and-symbols commented 5 years ago

The survey about Graph Networks: https://arxiv.org/abs/1806.01261

bzz commented 5 years ago

Last week runner-up: Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces

With the rise of machine learning, there is a great deal of interest in treating programs as data to be fed to learning algorithms. However...

fun fact: last week in Madrid at https://icsme2018.github.io/ afterparty sponsored by source{d} we met a person from the company of one of the authors of this paper.

m09 commented 5 years ago

Neural Networks for Modeling Source Code Edits

Programming languages are emerging as a challenging and interesting domain for machine learning. A core task, which has received significant attention in recent years, is building generative models of source code. However, to our knowledge, previous generative models have always been framed in terms of generating static snapshots of code. In this work, we instead treat source code as a dynamic object and tackle the problem of modeling the edits that software developers make to source code files. This requires extracting intent from previous edits and leveraging it to generate subsequent edits. We develop several neural networks and use synthetic data to test their ability to learn challenging edit patterns that require strong generalization. We then collect and train our models on a large-scale dataset consisting of millions of fine-grained edits from thousands of Python developers.

TL;DR: Neural networks for source code that model changes being made to the code-base rather than static snapshots of code.

m09 commented 5 years ago

Novel positional encodings to enable tree-structured transformers

With interest in program synthesis and similarly ﬂavored problems rapidly increasing, neural models optimized for tree-domain problems are of great value. In the sequence domain, transformers can learn relationships across arbitrary pairs of positions with less bias than recurrent models. Under the intuition that a similar property would be beneficial in the tree domain, we propose a method to extend transformers to tree-structured inputs and/or outputs. Our approach abstracts transformer’s default sinusoidal positional encodings, allowing us to substitute in a novel custom positional encoding scheme that represents node positions within a tree. We evaluated our model in tree-to-tree program translation settings, achieving superior performance to other models from the literature on several tasks.

TL;DR: We develop novel positional encodings for tree-structured data, enabling transformers to be applied to tree structured problems.

m09 commented 5 years ago

Structured Neural Summarization

Summarization of long sequences into a concise statement is a core problem in natural language processing, requiring non-trivial understanding of the input. Based on the promising results of graph neural networks on highly structured data, we develop a framework to extend existing sequence encoders with a graph component that can reason about long-distance relationships in weakly structured data such as text. In an extensive evaluation, we show that the resulting hybrid sequence-graph models outperform both pure sequence models as well as pure graph models on a range of summarization tasks.

TL;DR: One simple trick to improve sequence models: Compose them with a graph model.

m09 commented 5 years ago

Generative Code Modeling with Graphs

Generative models forsource code are an interesting structured prediction problem, requiring to reason about both hard syntactic and semantic constraints as well as about natural, likely programs. We present a novel model for this problem that uses a graph to represent the intermediate state of the generated output. Our model generates code by interleaving grammar-driven expansion steps with graph augmentation and neural message passing steps. An experimental evaluation shows that our new model can generate semantically meaningful expressions, outperforming a range of strong baselines.

TL;DR generates code by interleaving grammar-driven expansion steps with graph augmentation and neural message passing steps

m09 commented 5 years ago

Learning to Represent Edits

We introduce the problem of learning distributed representations of edits. By combining a "neural editor" with an "edit encoder", our models learn to represent the salient information of an edit and can be used to apply edits to new inputs. We experiment on natural language and source code edit data. Our evaluation yields promising results that suggest that our neural network models learn to capture the structure and semantics of edits. We hope that this interesting task and data source will inspire other researchers to work further on this problem.

m09 commented 5 years ago

Sorry for the spam, I just had a look at the papers submitted to ICLR'19 :)

kuba-- commented 5 years ago

"Yet Another Efficient Unification Algorithm" by Alin Suciu (https://arxiv.org/abs/cs/0603080v1).

The unification algorithm is at the core of the logic programming paradigm, the first unification algorithm being developed by Robinson. More efficient algorithms were developed later by Martelli and, Montanari. Here yet another efficient unification algorithm centered on a specific data structure, called the Unification Table.

kuba-- commented 5 years ago

Cuckoo filters provide the ﬂexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint.

Whitepaper: "Cuckoo Filter: Better Than Bloom" by Bin Fan, Dave Andersen and Michael Kaminsky (https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf).

bzz commented 5 years ago

The poll is on!

bzz commented 5 years ago

Chosen in #13

src-d / reading-club

Next paper candidates: 12 Oct #7

Next paper candidates