quic / toolchain_for_hexagon

Other
18 stars 11 forks source link

What's the difference of this compiler and HexagonSDK's? #26

Open zchrissirhcz opened 3 months ago

zchrissirhcz commented 3 months ago

Hi, QUIC toolchain maintainers:

I installed Hexagon SDK 5.5.0.1 (via QPM) which contains hexagon-clang 8.7.06:

zz@localhost:~/soft/Qualcomm/Hexagon_SDK/5.5.0.1$ ./tools/HEXAGON_Tools/8.7.06/Tools/bin/hexagon-clang --version
QuIC LLVM Hexagon Clang version 8.7.06
Target: hexagon
Thread model: posix
InstalledDir: /home/zz/soft/Qualcomm/Hexagon_SDK/5.5.0.1/./tools/HEXAGON_Tools/8.7.06/Tools/bin

I would like to analyze some performance issue of C/C++ code and it's disassembly. I notice there is a hexagon-clang compiler in Compiler Explorer (https://godbolt.org/):

image

What I am confused about is, are they the same or similar compiler?

androm3da commented 3 months ago

What I am confused about is, are they the same or similar compiler?

The hexagon compiler in the Compiler Explorer is the same one as produced by the scripts in this repo. But it's different from the one in the Hexagon SDK. It's different in several ways, there's different passes provided by the compiler in the SDK, for example. But it also might be using a different baseline LLVM/Clang version.

For example, 8.7.06 is based on llvm+clang 15.0.0:

$ readlink /local/mnt/workspace/Qualcomm/Hexagon_SDK/5.4.1.1/tools/HEXAGON_Tools/8.7.06/Tools/bin/hexagon-clang
clang-15

But ultimately they're similar in that they both produce executable code for Hexagon DSPs.

zchrissirhcz commented 3 months ago

@androm3da Thank you for the reply.

But ultimately they're similar in that they both produce executable code for Hexagon DSPs.

OK, so I can use Compiler Explorer for generate purpose assembly analysis, such as counting how many intruction packets as estimation of the program, is that correct?

And I also wonder if they use same stack size? I find it about 14000 bytes in a unittest program of v66 cDSP, from the HexagonSDK 5.5.0.1's, which is far less than x86-64 Linux (~8192 KB). This repo's hexagon-clang use musl libc, and it seems mucl lib use a smaller stack size.

androm3da commented 3 months ago

OK, so I can use Compiler Explorer for generate purpose assembly analysis, such as counting how many intruction packets as estimation of the program, is that correct?

You should expect the codegen performance of these two compilers to be different - at least with the current releases of each. This would mean that if you want to count the number of packets emitted for a given C program, you should expect differences in this count between the two.

And I also wonder if they use same stack size? I find it about 14000 bytes in a unittest program of v66 cDSP, from the HexagonSDK 5.5.0.1's, which is far less than x86-64 Linux (~8192 KB). This repo's hexagon-clang use musl libc, and it seems mucl lib use a smaller stack size.

It's important to note -- there are two targets usable with the toolchain built in this repo: the baremetal one hexagon-unknown-none-elf and the Linux one hexagon-unknown-linux-musl. The baremetal one has the correct ABI for code that would run on QuRT OS. The linux one cannot be used for programs that would run on QuRT.

Your question regarding stack size - are you asking about the typical size of an individual frame, or the size of the entire stack allocation? Linux programs would grow their stack dynamically. I don't recall the stack allocation size / behavior for QuRT but I might be able to look up this information. Deciding when to use the stack and how much of the stack to use - that is an aspect of the compiler's codegen performance and that would differ among the Hexagon SDK and this toolchain's compiler.

zchrissirhcz commented 3 months ago

You should expect the codegen performance of these two compilers to be different - at least with the current releases of each. This would mean that if you want to count the number of packets emitted for a given C program, you should expect differences in this count between the two.

OK, the two compilers are different and use one compiler for {src1.cpp, src2.cpp} comparison or { compile option1, compile option2} comparison is the usual way to use.

Your question regarding stack size - are you asking about the typical size of an individual frame, or the size of the entire stack allocation?

Yes, I am asking the size of the entire stack allocation. The frame chain is like: main() -> gemm() -> gemm_internal(), for matrix-matrix multiplication:

// cv::AutoBuffer is part of OpenCV
// whole class: https://github.com/opencv/opencv/blob/4.x/modules/core/include/opencv2/core/utility.hpp#L71-L151
// default fixed_size : https://github.com/opencv/opencv/blob/4.x/modules/core/include/opencv2/core/utility.hpp#L100
// stack allocated buffer: https://github.com/opencv/opencv/blob/4.x/modules/core/include/opencv2/core/utility.hpp#L150
/*
template<typename _Tp, size_t fixed_size = 1024/sizeof(_Tp)+8> class AutoBuffer
{
public:
    ...
    _Tp buf[(fixed_size > 0) ? fixed_size : 1];
};
*/

void gemm()
{
    cv::AutoBuffer buf1;
    ...
    gemm_internal();
    ...
}

void gemm_internal()
{
    cv::AutoBuffer buf2;
    ...
}

int main()
{
    float a[200*200];
    float b[200];
    randomize(a, b);
    float c[200];
    gemm(a, b, c, 200, 200, 200, 1);
}

As illustrated, both gemm() and gemm_internal() use a cv::AutoBuffer instance, which will consume stack memory. When the maximum allowed entire stack size is small, the pasted code may easily reach the limit, cause segmentation fault when running. And in Linux x86-64, the allowed entire stack size is large, the mentioned segmentation nearly won't happen.

zchrissirhcz commented 3 months ago

There is also a difference for integer types between the two compilers. I use the following snippet for compile-time testing, and got different output:

#include <stdint.h>
#include <stdio.h>
#include <type_traits>

template<typename T> static inline T saturate_cast(uint32_t v)  { return T(v); }
template<typename T> static inline T saturate_cast(int32_t v)   { return T(v); }

int main()
{
    //int a = 233;
    //saturate_cast<uint8_t>(a);

    static_assert(std::is_same<int, int32_t>::value, "int is not int32_t");
    static_assert(std::is_same<int, long>::value, "int is not long");
    static_assert(!std::is_same<int32_t, long>::value, "int32_t is same as long");

    return 0;
}

Output from HexagonSDK 5.5.0.1's hexagon-clang:

<source>:13:5: error: static assertion failed due to requirement 'std::is_same<int, long>::value': int is not int32_t
    static_assert(std::is_same<int, int32_t>::value, "int is not int32_t");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:14:5: error: static assertion failed due to requirement 'std::is_same<int, long>::value': int is not long
    static_assert(std::is_same<int, long>::value, "int is not long");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:15:5: error: static assertion failed due to requirement '!std::is_same<long, long>::value': int32_t is same as long
    static_assert(!std::is_same<int32_t, long>::value, "int32_t is same as long");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 errors generated.
Compiler returned: 1

Output from Compiler Explorer's hexagon-clang 16.0.5: (https://godbolt.org/z/v3nYrqnhq)

<source>:14:5: error: static assertion failed due to requirement 'std::is_same<int, long>::value': int is not long
    static_assert(std::is_same<int, long>::value, "int is not long");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Compiler returned: 1
androm3da commented 3 months ago

When the maximum allowed entire stack size is small, the pasted code may easily reach the limit, cause segmentation fault when running.

Okay, I see -- so you're trying to do some static analysis of the maximum stack depth? To compare with the OS limitation(s) on stack size?

There is also a difference for integer types between the two compilers

Incidentally I had looked into this recently. Some differences between the Hexagon SDK compiler and this open source toolchain are expected. But this one may not be - I'll do a bit of digging.