Closed FernandoPieressa closed 5 years ago
@VHellendoorn does this plan sound good?
Sorta, some comments:
Output format, where we will tentatively focus only on the special case where only one java file changed with one diff in that file removing and adding one line:
organization,project,commit,files_changed,java_files_changed,file,file_diffs,lines_removed,lines_added,line_rm_start,line_rm_end,line_add_start,line_add_end
a2888409,face2face,59f8472a9611ab1dd9f5f58de18f6a758aca9866,1,1,gate/src/main/java/gate/ClientMessage.java,2,1,1,59,59,59,59
@VHellendoorn sounds good.
We will be using the following command: git show REVISION:path/to/file to get the file before a particular commit.
Can you explain what the following values in your output represent: line_rm_start, line_rm_end, line_add_start, and line_add_end. In particular, are these lines in the precommit files or the postcommit files.
The rm lines are in the pre-commit file, the add lines in the post-commit.
ToDo:
[x] Filter current data by bug/fixes
[ ] Saving file before change.
[ ] Saving line after change.
[ ] Given the files before change, lex with BPE and embeddings word2vec