src-d / ml

sourced.ml is a library and command line tools to build and apply machine learning models on top of Universal Abstract Syntax Trees
Other
141 stars 44 forks source link

[WIP] NN tokenizer #405

Closed glimow closed 5 years ago

glimow commented 5 years ago

@warenlg's neural network tokenizer integration into modelforge and src-d's TokenParser.

glimow commented 5 years ago

1- OK 2- Yes, it's on it's way 3- I have seen that, The tests failed when I used tempfile and I wanted to investigate the problem so I ran them with the hardcoded path. Will change that.

warenlg commented 5 years ago

Awesome @glimow !

Agreed with Konst. Also I think it could be nice to squash your similar commits like "fix checks" for example so that we keep a relevant commit history.

glimow commented 5 years ago

Awesome @glimow !

Agreed with Konst. Also I think it could be nice to squash your similar commits like "fix checks" for example so that we keep a relevant commit history.

You are right. Commits Squashed. Highlights in @zurk's comment taken into account as well. PR is almost finished, last question is where to get the model weights from when using it in TokenParser

zurk commented 5 years ago

last question is where to get the model weights from

I think they should be included in the modelforge model. You should load the model and take them from it. Do I miss something?

Also, tests are not passed. PTAL https://travis-ci.com/src-d/ml/jobs/193718842#L1680-L1743

glimow commented 5 years ago

Yes, tests do not pass due to tf misconfig. They worked before so I have to find the commit where things got mixed up. I will ask @irinakhismatullina to help me uploading the weights when they will pass

zurk commented 5 years ago

:+1: git bisect is your best friend in such a case.

glimow commented 5 years ago

NOTE: Tests pass, but travis is laggy !