Closed EgorBu closed 5 years ago
Pair programming issue https://github.com/src-d/feature-idea/issues/144
I started the dataset collection again on the ML cluster, if it does not get killed in 1 hour - it will finish by the next week.
I am currently running a robust collection again. The previous process just stopped writing results for an unknown reason. The result is split into chunks, and it is possible to continue without losing the progress.
/user/legacy/backup/ghtorrent
This has moved under the scope of https://github.com/src-d/eee-identity-matching
As developer/researcher who wants to analyze code bases, I want to be able to identify developers based on information available from git history.
Identity matching is an important problem for almost any possible customer. Whenever we will use code bases from different companies - we will meet issues that the same developer uses different names/emails in commits. We should be able to handle this situation properly. And there should be python module for it.
A short summary of the existing approaches could be found here