ssc-oscar / lookup

A mirror of bitbucket.org/swcs/lookup
1 stars 4 forks source link

Better way to get Additions/Deletions per Commit? #22

Closed k----n closed 3 years ago

k----n commented 3 years ago

The github API returns metadata with additions/deletions/total: https://docs.github.com/en/free-pro-team@latest/rest/reference/repos#get-a-commit

"stats": {
    "additions": 104,
    "deletions": 4,
    "total": 108
  }

Right now using WoC I can run the following to get additions/deletions per commit... Create the bash script diffstats.sh:

#!/bin/bash

insertions=0
deletions=0

while IFS= read -r  a; do
    file1=$(echo $a | cut -d\; -f3)
    file2=$(echo $a | cut -d\; -f4)
    diffstats=$(diff <(echo $file1 | ~/lookup/showCnt blob) <(echo $file2 | ~/lookup/showCnt blob) | diffstat -t | tail -n 1) #INSERTED,DELETED,MODIFIED,FILENAME
    insertions=$((insertions + $(echo $diffstats | cut -d, -f1)))
    deletions=$((deletions + $(echo $diffstats | cut -d, -f2)))
done
total=$((insertions+deletions))
echo "$insertions;$deletions;$total"

Then with commit 009d7b6da9c4419fe96ffd1fffb2ee61fa61532a

> echo 009d7b6da9c4419fe96ffd1fffb2ee61fa61532a | ssh da4 ~/lookup/cmputeDiff3.perl | ./diffstats.sh 
2;0;2

To get insertions;deletions;total.

Is there a better way to do this? Also what's the difference between cmputeDiff2.perl and cmputeDiff3.perl? It also seems like there's no indexes for cmputeDiff3CT.perl or cmputeDiffT.perl?

audrism commented 3 years ago
  1. There are ldiffFrom* scripts (ldiffFromBdiff is probably fastest) but all are very expensive to compute, so I would suggest to drop them unless you are computing a very small number. LOC diff tends to be fairly meaningless anyway, more robust approach just to count files modified
  2. There is no unique way to calculate diff for commits with 2+ parents. Use cmputeDiff3.perl: see algorithm described in the file. C and T stand for whether to use offset map for commits/trees