willi19 / swpp202301-compiler-team6

MIT License
0 stars 0 forks source link

Benchmark scripts and more detailed analysis of sprint 1 #22

Open sharaelong opened 1 year ago

sharaelong commented 1 year ago

Hello, I made a scripts for benchmarking automatically. It will executes in utils/ directory located in root. Use compile.sh -> benchmark.c order. Baseline data has to be located in same directory as log-2. (or change hardcoded name in benchmark.c if you want) This is very naive structure since it is not main changes for our project, but it will be advanced. Also it doesn't check output is consistent with baseline!

#!/bin/zsh

cd ..
cmake -GNinja -Bbuild
cmake --build build --target swpp-compiler

cd ~/Developers/swpp202301-benchmarks/
rm -r logs
grep -lr '^uint64_t read();$' . | tr '\n' '\0' | xargs -0 -I{} sed -i '' 's/^uint64_t read();$/uint64_t read(void);/g' {}
grep -lr '^int64_t read();$' . | tr '\n' '\0' | xargs -0 -I{} sed -i '' 's/^int64_t read();$/int64_t read(void);/g' {}
python3 build-lls.py ~/Developers/llvm-swpp/bin
python3 build-asms.py ~/Developers/swpp202301-compiler-team6/build/swpp-compiler

mkdir -p logs
for t in */src/*.s;
do
    for if in "${t%%/*}"/test/input*.txt;
    do
        echo "$if";
        diff <(cat "$if" | ~/Developers/swpp202301-interpreter/swpp-interpreter "$t") "${t%%/*}/test/output${if##*input}";
        mv swpp-interpreter-cost.log "logs/${t%%/*}-${if##*input}-cost.log";
        mv swpp-interpreter-inst.log "logs/${t%%/*}-${if##*input}-inst.log";
        mv swpp-interpreter.log "logs/${t%%/*}-${if##*input}.log";
    done;
done

rm -r ~/Developers/swpp202301-compiler-team6/utils/logs
mv logs ~/Developers/swpp202301-compiler-team6/utils/logs
#include <iostream>
#include <fstream>
#include <filesystem>
#include <regex>
#include <string>
#include <map>

using namespace std;

void read(string directory, map<string, vector<double>>& mp) {
    std::vector<std::string> filenames;
    for (const auto& entry : filesystem::directory_iterator(directory)) {
        if (entry.is_regular_file()) {
            filenames.push_back(entry.path().filename().string());
        }
    }
    sort(filenames.begin(), filenames.end());

    regex filenamePattern(R"((\w+)-\d+\.txt-cost\.log$)");
    for (const std::string& filename : filenames) {
        smatch match;
        if (regex_match(filename, match, filenamePattern)) {
            ifstream file(directory + "/" + filename);
            string casename = match[1].str();
            string line;
            getline(file, line);
            getline(file, line);
            regex targetPattern("main:\\s(\\d+\\.\\d+)");
            smatch match;
            if (regex_search(line, match, targetPattern)) {
                string targetValue = match[1].str();
                mp[casename].push_back(stod(targetValue));
            }
            file.close();
        }
    }
}

int main() {
    string dir = "logs";
    map<string, vector<double>> mp, ori_mp;
    read(dir, mp);
    dir = "logs-2";
    read(dir, ori_mp);

    for (auto[k, v]: mp) {
        vector<double> opt_rate;
        for (int i=0; i<mp[k].size(); ++i) {
            opt_rate.push_back(1.0 - (double)mp[k][i] / ori_mp[k][i]);
        }
        sort(opt_rate.begin(), opt_rate.end());

        double opt_avg = 0;
        for (int i=0; i<mp[k].size(); ++i) {
            opt_avg += opt_rate[i];
        }
        cout << k << ": " << opt_avg / (mp[k].size()) << endl;
    }

    return 0;
}

Furthermore, I tested combinations of our sprint 1 pass one by one. When applying only llvm original pass (simplifyCFG, promote), it showed like this:

스크린샷 2023-05-15 오전 1 10 44

With default pass and load2aload pass applied,

스크린샷 2023-05-15 오전 1 11 09

With default pass and branchpredict pass applied,

스크린샷 2023-05-15 오전 1 11 51

For last, default pass and arithmetic pass results.

스크린샷 2023-05-15 오전 1 12 10

Finally this is our final result which apply every pass we implemented in sprint 1.

스크린샷 2023-05-15 오전 1 15 23

It seemed strange that optimization performance is really far from simple accumulate of each pass effect, but this appears to have a high likelihood of which load2aload pass 'eats' previously optimized IR. Also default pass (simplifyCFG, promote) gives hugely negative performance to especially two test cases: gcd and collatz. It has to be analyzed more.

sharaelong commented 1 year ago

You can check our discord channel to download log-2 zip file.

sharaelong commented 1 year ago

By the way, I don't know why branchpredict and arithmetic shows exactly same results in benchmark, which is different from default pass solely.

goranmoomin commented 1 year ago

Also default pass (simplifyCFG, promote) gives hugely negative performance to especially two test cases: gcd and collatz.

I didn't check collatz, but for GCD, a simple diff suggests that the negative performance comes from simplifycfg canonicalizing branches. The result of diff gcd/src/gcd.ll <(opt --passes='simplifycfg' -S gcd/src/gcd.ll) is...

10c10
<   br i1 %cmp, label %if.then, label %if.end
---
>   br i1 %cmp, label %return, label %if.end
12,14d11
< if.then:                                          ; preds = %entry
<   br label %return
< 
17c14
<   br i1 %cmp1, label %if.then2, label %if.end3
---
>   br i1 %cmp1, label %return, label %if.end3
19,21d15
< if.then2:                                         ; preds = %if.end
<   br label %return
< 
40,41c34,35
< return:                                           ; preds = %if.end7, %if.then2, %if.then
<   %retval.0 = phi i64 [ %y, %if.then ], [ %x, %if.then2 ], [ %call, %if.end7 ]
---
> return:                                           ; preds = %if.end, %entry, %if.end7
>   %retval.0 = phi i64 [ %call, %if.end7 ], [ %y, %entry ], [ %x, %if.end ]

This suggests that, if we get to implement BranchPredictPass properly, we should be able to get all of the performance back.

sharaelong commented 1 year ago

I can’t follow what is happened here… It didn’t seem to be a reason that simplified branch gives lower performance than the naive one. Can you give more detailed explanation about why this harms performance?

goranmoomin commented 1 year ago

My logic was that since the differences only occur in branching instructions, I was just assuming that maybe the truthy branch and the falsy branch got switched over? But yeah with a closer look it just seems to be eliminating a block, and I have no idea why that would be causing any trouble at all.