sagr4019 / ResearchProject

A research project in neuronal networks
5 stars 2 forks source link

AST encoding concept #127

Open DanielEggelmann opened 5 years ago

DanielEggelmann commented 5 years ago

Develop a concept to encode AST for the use in neural networks.

DanielEggelmann commented 5 years ago

Wiki page: https://github.com/sagr4019/ResearchProject/wiki/AST-vector-encoding

jan-christiansen commented 5 years ago

Please use an approach or a combination of approaches that we have read about.

DanielEggelmann commented 5 years ago

Approach: Tokens & Word2Vec Another approach would be to use tokens in combination with word2vec. Every node of the AST would be taken as token and converted to an integer serving as index for the embedding layer. To flatten the tree it would be traversed preorder. Each type of token should be assigned an integer. Variables can dealt with in two ways, a preset set of variables could be used, with a hard coded index for each. Another implementation could be reserving a certain range of indices for variables and then assigning the indices dynmically to variables in the order they appear. These indices are then fed to an embedding layer which converts the indices to vectors, which can be processed by the neural network. Embedding layers are already part of Keras, therefore no additional software components are needed.