Open bzz opened 6 years ago
Gigantic effort @bzz!
All right, I think this( https://github.com/src-d/ml/blob/master/sourced/ml/cmd/repos2bow.py ) can be helpful. We use it to convert repositories to BOW models. It is so complex because we also use it in the Apollo project. But the common pipeline idea to create BOW model can be found in an initial code of repo2bow: https://github.com/zurk/ml/blob/d7a093de39e90db9a9c74515d6b2029240de7b96/sourced/ml/cmd_entries/repos2bow.py
I am not sure how deep your knowledge in new sourced-ml, @bzz, If you want we can have a call and I explain to you main aspects.
This is an excellent chance to improve our documentation btw.
yeah, good idea. we have something here: https://docs.sourced.tech/sourced-ml but it tells you how to use it and nothing about developing.
I think I can add more docstrings to our codebase. @bzz if you can, please let me know about everything that is confusing or hard to get in sourced-ml, I will add docstrings there firstly. I am asking, because It is hard to know most problematic places from inside :)
@bzz The core part here is extracting the BOW. You can use the revamped function from Vecino now: https://github.com/src-d/vecino/blob/master/vecino/repo2bow.py
Yes, that is exactly missing component that I had to resurrect from git history 🚀
Is that ok to use vecino as dependency here?
@bzz It is completely fine to copy-paste for now - we will add this to sourced-ml once we have time.
This is one Sunday afternoon attempt to make tmsc great again.
It's WIP as usage of BOW model from modelforge should be removed as per discussion in https://github.com/src-d/models/issues/11
Early feedback is warmly appreciated though, helping to make it ready to merge at some point.
Current version is able to run and produce results:
Full log