Closed damevski closed 6 years ago
SrcML.NET is out there, but it's not maintained and using an old version of srcML that uses two executables. I think we can accomplish what we need with a few custom external calls to the current version.
i agree. let's use the latest version of SrcML
We can now generate srcML documents of type XDocument
using the functions in c05d7b652bc302fd3199a9069c7883496fdd4037. srcML must be in your PATH. TODO:
After trying different diffs, I think we will run into problems trying to parse them in isolation. You can't count on the context provided. We should instead parse the patched file for each commit in SrcML with the --position
flag, which makes it possible to then correlate with the diffs on line and column number. Some psuedocode:
For each file in commit_files
keyword_list = []
srcMLdoc = raw_url parsed with srcML
Filter to only include useful nodes
For each @@ ---- @@ diff block in patch
filtered_diff = filtered for additions
For each line in filtered_diff
Find nodes matching pos:line in srcMLdoc
Add to keyword_list
We can now process hunk blocks in unified diffs to get the line additions for files. When processing files, we should check the status
for each file in the commit and only process on modified
and created
files, as removed
files are irrelevant. The next step is now correlating line numbers with full files parsed by SrcML to pick out the values we want.
try parsing diffs with SrcML first, and if that doesn't work, probably parse entire files with SrcML then correlate to the diff (or figure out what to do with diffs, in general)