src-d / ml-backlog

Issues belonging to source{d}'s Machine Learning team which cannot be related to a specific repository.
0 stars 3 forks source link

UAST node role vec: Compare different embeddings #36

Closed marnovo closed 5 years ago

marnovo commented 7 years ago

EPIC #858

Story: "As a machine learning engineer I want to know the quality of different embeddings to compare how different approaches fare to my problem."

marnovo commented 7 years ago

@fineguy update the issue with the developments since the last sprint started, please.

fineguy commented 7 years ago

As a very basic approach, we used the following: 1) Use only internal nodes (not root or a leaf) 2) For each node calculate average embedding of roles and tokens for its parent and separately for its children. Concatenate these two vectors. 3) Train MLP classifier with one hidden layer on these representations to predict node roles. 4) In case of Swivel and Glove embeddings we were able to overfit on scikit-learn repository with almost perfect accuracy, now we need to run our testing pipeline to see the actual score.

fineguy commented 7 years ago

Code is available here: https://github.com/fineguy/role2vec/tree/master/embeddings

EgorBu commented 7 years ago

@fineguy , can you provide gist for intermediate results?

vmarkovtsev commented 7 years ago

Tim: the whole pipeline is ready. Ran on old UASTs and they appeared to have no roles for many nodes.

vmarkovtsev commented 7 years ago

Blocked in https://github.com/src-d/backlog/issues/1037 Tim will try to invent a workaround.

vmarkovtsev commented 5 years ago

This is outdated. @zurk has worked on the role id prediction problem in https://github.com/src-d/ml-backlog/issues/17