Closed VHellendoorn closed 5 years ago
@VHellendoorn for the lexed line involved in the fix, why would we need to lex the full fixed file as well? the problem of lexing the whole file is that after lexing then its impossible to know what line of code was actually modified
Shouldn't it be lex after getting line and just that?
The problem is that you won't know if the line is inside e.g. a comment based on just the line itself. Lexing the whole file is the only fool-proof way. We do need to make sure when lexing the full file that all the original line indices are preserved. I'm pretty sure the Pygments code I included preserves all new-lines for that purpose
It doesnt preserve, i was just testing it. But i will try to edit it
That's weird, line 66 should print a '\n' every time it sees one in the input. Not sure why it wouldn't...
The problem is that line 66 \n didn't count all scenarios, for example with multi line comments. But now its fixed
Great, thanks
See second comment on #1: ideally while extracting git diff information, we should also store the full lexed file before the bug fix, and the lexed line(s) involved in the fix (for which we need to lex the full fixed file as well). Pygments is just a suggestion, but has worked well for me in the past.
Note, I do remember Pygments messing up with Java fully qualified names (e.g. import statements), which it treates as one giant token, so some heuristic splitting will be necessary at times.