riscv-software-src / riscv-tools

RISC-V Tools (ISA Simulator and Tests)
1.13k stars 446 forks source link

Different interpretation of one elf file #215

Closed pentin-as closed 6 years ago

pentin-as commented 6 years ago

I run the same toolchain (not recompiled) with the same elf file (not recompiled) on different PCs and have different results.

For example, I cause: riscv64-unknown-elf-objdump -S mt-matlul.riscv

On PC1 (1.txt) from address 0x80001048 begins "barrier, and from the address 800011a0 - "matmul". On PC2 (2.txt) from the address 0x80001048 begins "matmul", and from the address 80001290 - "barrier".

Because of this, I have different times on different PCs to run the same test. How can it be?

1.txt 2.txt mt-matmul.zip

pentin-as commented 6 years ago

On PC1 I have the following results: matmul(cid, nc, 16, input1_data, input2_data, results_data); barrier(nc): 16814 cycles, 4.1 cycles/iter, 0.6 CPI On PC2 I have the following results: matmul(cid, nc, 16, input1_data, input2_data, results_data); barrier(nc): 16763 cycles, 4.0 cycles/iter, 0.6 CPI For the remaining benchmarks on PC1 and on PC2, the results are the same.

jim-wilson commented 6 years ago

Odd, but no clear evidence of a problem with the riscv tools. There is apparently something different about the two pcs. Maybe they have different processors, or different operating systems installed, or different versions of the same OS, etc, and this is somehow causing the riscv tools to behave differently on the two systems. Or maybe you made a mistake somewhere, and these are actually different binaries compiled with different compiler options. Try using md5sum or something similar to verify if the binaries are the same.

It is possible that the same C source compiled on two different machines with the same compiler can generate different binaries. The C compiler uses C library routines like qsort which could be different if the two machines have different versions of the C library for instance.

It appears that your two binaries are the same except that two functions have swapped positions. This is probably causing different cache hits/misses, which could cause small timing differences when using a cycle accurate simulator.

pentin-as commented 6 years ago

Yes, the differences are only in the swapped positions. Most likely different times are due to the cache. But, I think, this should not be.

Yes, PCs have different configurations. This is a workstation and a server. Both have a Debian stretch, but of course, the package set is slightly different.

But, I will not recompile anything. I copy the folder toolchain (which was created on one of the PCs by setting the RISCV variable and calling build.sh). The toolchain folder contains riscv-unknown-elf-gcc, newlib, benchmarks, and so on. I run the same utilities, with the same compiled file from the toolchain folder.

Since I'm working with the same elf file, it makes no sense to get it md5. If I make a hex file from it using riscv-unknown-elf-objcopy, hex files will be different on different PCs.

I believe that riscv-unknown-elf-objdump should use the runtime library only from the toolchain folder and should not depend on the operating system packages.

Is it possible to run riscv-unknown-elf-objdump with some debugging keys so that you can understand what is wrong?

jim-wilson commented 6 years ago

I don't believe that it is the same elf file. Simple utilities like objdump and objcopy won't create different output for the same elf file.

It is possible that the same compiler binary and the same source file can produce different elf files on two different computers with slightly different OS installs. This is the most likely explanation for what you are seeing. Maybe you rebuilt the program on one computer and didn't realize it?

pentin-as commented 6 years ago

Thank you. Indeed, there were different elf files. My mistake. I'm sorry.