Closed marnovo closed 5 years ago
Needs some bug fixing from ML side. Ongoing
@fineguy in case you want to train any additional models please update the issue here.
@fineguy update the issue with the developments since the last sprint started, please.
I've trained embeddings using Swivel (/storage/timofei/role2vec/swivel
) and GloVe (/storage/timofei/role2vec/glove
). Node2Vec training is a lot slower - it scales poorly, so I never saw it finish training.
Code is available here: https://github.com/fineguy/role2vec/tree/master/embeddings
@fineguy , can you share gist with code to reproduce your experiment with embeddings?
Path to project folder: /storage/timofei/embeddings
. All UASTs were randomly split into train, test and valid sets in 60%:20%:20% ratio.
1) uasts_train.txt
, uasts_test.txt
, uasts_valid.txt
-- text files with paths to UASTs for train, test and valid sets.
2) prox_train.txt
, prox_test.txt
, prox_valid.txt
-- directories with proximity matrices extracted accordingly for train, test and valid sets.
Proximity matrices have depth 1 (i.e. regular co-occurrence matrices).
Node2Vec is hard to scale, work in progress.
Blocked because there is not dataset https://github.com/src-d/backlog/issues/1040
This is outdated. @fineguy is no longer an intern at source{d}. We used the knowledge from this project in the next experiments.
EPIC: https://github.com/src-d/backlog/issues/858
Story: "As a data scientist or developer, I want the best tradeoff of trained models that can suggest node roles for a given UAST."