While doing my research on CSDiff, I had to compare many versions of it, meaning I had to run miningframework multiple times.
For a given time interval, the tool downloads every commit that is non-fast-forward, meaning that even if the commit has 1 files with conflict and 100 files with no conflicts (fast forward merge), all the 101 files will be downloaded. This can easily take about 30GB of the device's if you run the tool with 10 projects for an interval of 1 month.
In my case, I needed only the files where the results between CSDiff and Diff3 were different. To be able to obtain only the info i needed with the current implementation, I had to do some workarounds see this branch.
In summary I:
1) created one csv for each project here
2) ran miningframework once for each project script
2.1) deleted every unwanted file using this after each run
3) then i created another csv with the relevant data
I think this can be done directly via miningframework, probably around here. As the tool have filters for the commits, it could probably have filters for files too.
This would make it possible to get more data for next researches using less memory.
While doing my research on CSDiff, I had to compare many versions of it, meaning I had to run miningframework multiple times.
For a given time interval, the tool downloads every commit that is non-fast-forward, meaning that even if the commit has 1 files with conflict and 100 files with no conflicts (fast forward merge), all the 101 files will be downloaded. This can easily take about 30GB of the device's if you run the tool with 10 projects for an interval of 1 month.
In my case, I needed only the files where the results between CSDiff and Diff3 were different. To be able to obtain only the info i needed with the current implementation, I had to do some workarounds see this branch.
In summary I: 1) created one csv for each project here 2) ran miningframework once for each project script 2.1) deleted every unwanted file using this after each run 3) then i created another csv with the relevant data
I think this can be done directly via miningframework, probably around here. As the tool have filters for the commits, it could probably have filters for files too.
This would make it possible to get more data for next researches using less memory.