Closed marnovo closed 5 years ago
@fineguy update the issue with the developments since the last sprint started, please.
As a very basic approach, we used the following: 1) Use only internal nodes (not root or a leaf) 2) For each node calculate average embedding of roles and tokens for its parent and separately for its children. Concatenate these two vectors. 3) Train MLP classifier with one hidden layer on these representations to predict node roles. 4) In case of Swivel and Glove embeddings we were able to overfit on scikit-learn repository with almost perfect accuracy, now we need to run our testing pipeline to see the actual score.
Code is available here: https://github.com/fineguy/role2vec/tree/master/embeddings
@fineguy , can you provide gist for intermediate results?
Tim: the whole pipeline is ready. Ran on old UASTs and they appeared to have no roles for many nodes.
Blocked in https://github.com/src-d/backlog/issues/1037 Tim will try to invent a workaround.
This is outdated. @zurk has worked on the role id prediction problem in https://github.com/src-d/ml-backlog/issues/17
EPIC #858
Story: "As a machine learning engineer I want to know the quality of different embeddings to compare how different approaches fare to my problem."