jwt27 / build-gcc

Shell scripts to build various gcc cross-compilers (primarily djgpp)
https://jw.h4ck.me/debian/
GNU General Public License v3.0
44 stars 9 forks source link

Is Gprof still a problem with DJGPP ? #45

Open pif17 opened 3 weeks ago

pif17 commented 3 weeks ago

I have tried to build a DJGPP cross toolchain with the script on a fresh debian. There were trouble with missing depedencies, I installed libgmp-dev libmpfr-dev to solve part of them and manualy copy libgmp.la and another lib from build directory to install directory. It seemed to be weird, but, the script was able to build GDB and finished the install.

This toolchain were build to try Gprof, so I write a litle program and a makefile to test it. The program just print some compile info and its parameters and environnement and call 2 handmade functions.

tst-gprof.zip

First things: Using argument's counter and argument's string array make the program crash (SIGSEGV). To solve this, "__crt0_argc" and "__crt0_argv" from "crt0.h" were used.

Second things: "gmon.out" is written but, it is not usable, like an empty file. it look like this stackoverflow question, so "-no-pie" and "-fno-pie" were used but nothing change.

Is this a bug or did i make something wrong ?

Software in use : GNU gprof (GNU Binutils) 2.43.1 gcc version 14.2.0 (GCC) DJGPP 2.5

jwt27 commented 2 weeks ago

Hi!

I have tried to build a DJGPP cross toolchain with the script on a fresh debian. There were trouble with missing depedencies, I installed libgmp-dev libmpfr-dev to solve part of them and manualy copy libgmp.la and another lib from build directory to install directory. It seemed to be weird, but, the script was able to build GDB and finished the install.

Which commit did you build from? Just yesterday I changed the way these libraries are built, this should be fixed now.

Will try your test case. I have used gprof before with djgpp, it should work.

jwt27 commented 2 weeks ago

Test case works for me with arguments received from <crt0.h>. I get a workable gmon.out file, 41KB in size.

With regular argc/argv, I see the segfault. This is due to gcc performing stack alignment on entering main() (a fairly recent feature):

000017ff <main>:
    17ff:   8d 4c 24 04             lea    ecx,[esp+0x4]
    1803:   83 e4 f0                and    esp,0xfffffff0
    1806:   ff 71 fc                push   DWORD PTR [ecx-0x4]
    1809:   55                      push   ebp
    180a:   89 e5                   mov    ebp,esp
    180c:   56                      push   esi
    180d:   53                      push   ebx
    180e:   51                      push   ecx
    180f:   83 ec 1c                sub    esp,0x1c
    1812:   ba e8 5e 01 00          mov    edx,0x15ee8
    1817:   e8 44 3d 00 00          call   5560 <mcount>
    181c:   89 cb                   mov    ebx,ecx

GCC stores the argument pointer in ecx, but this is clobbered by mcount(). That is definitely a bug. Either gcc should emit mcount before stack alignment, or mcount should not clobber any registers.

pif17 commented 2 weeks ago

Hi!

I have tried to build a DJGPP cross toolchain with the script on a fresh debian. There were trouble with missing depedencies, I installed libgmp-dev libmpfr-dev to solve part of them and manualy copy libgmp.la and another lib from build directory to install directory. It seemed to be weird, but, the script was able to build GDB and finished the install.

Which commit did you build from? Just yesterday I changed the way these libraries are built, this should be fixed now.

Will try your test case. I have used gprof before with djgpp, it should work.

" git rev-parse HEAD" answer me "106759c1ca7e43e3eef66b798214aadc37f8c17e"

pif17 commented 2 weeks ago

Test case works for me with arguments received from <crt0.h>. I get a workable gmon.out file, 41KB in size.

Ok, perhaps I have made a mistake using the tool, I'm learning it.... I have tried on "Dosbox 0.74-3", "dosbox-staging 0.81.2" and "FreeDOS 1.3"(real hardware), with or without "-no-pie" and "-fno-pie" flags. I still have something that I think empty.

Workflow: On debian to cross-compil:

i386-pc-msdosdjgpp-gcc -Wall -pg -g -O0 -o buildv.o -c buildv.c
i386-pc-msdosdjgpp-gcc -Wall -pg -g -O0 -o buildv.exe buildv.o

run on target and get back generated "GMON.OUT", (size 42ko) back in debian :

i386-pc-msdosdjgpp-gprof buildv.exe

Command doesn't talk about "printf()", "foo1()" and "foo2()" call. Wish command line did you use to give result?

GCC stores the argument pointer in ecx, but this is clobbered by mcount(). That is definitely a bug. Either gcc should emit mcount before stack alignment, or mcount should not clobber any registers.

Thanks for your analyse, time and work

Edit : Add output_sample.txt and updated Makefile with "test" target and different build option output_sample.txt tst-gpr2.zip

jwt27 commented 2 weeks ago

" git rev-parse HEAD" answer me "106759c1ca7e43e3eef66b798214aadc37f8c17e"

Yes, it was broken then, sorry. It didn't build gmp/mpfr for gdb, only gcc. Has been fixed now.

Command doesn't talk about "printf()", "foo1()" and "foo2()" call.

The profiler uses a 55ms tick rate (hooks IRQ 0), your test program exits before it can gather any data. On a longer-running program, you will see results.

BTW, the pie/no-pie options have no effect on djgpp, executables are always non-PIE.

Now for the segfault, the question is who is to blame: gcc or libc? I asked this on the gcc-help list, but no replies. Testing with godbolt.org reveals that gcc has done stack alignment in main() since version 4.1.x, so the bug has been present since then. I'm surprised no one has noticed it before.

pif17 commented 2 weeks ago

Yes, it was broken then, sorry. It didn't build gmp/mpfr for gdb, only gcc. Has been fixed now.

OK, I have build a new tool-chain with updated git tree, all is fine.

The profiler uses a 55ms tick rate (hooks IRQ 0), your test program exits before it can gather any data. On a longer-running program, you will see results.

OK, I begun to look closer to this, my final work use a ~1ms interupt to make thread scheduling and other things, there will be a clash somewhere later.

BTW, the pie/no-pie options have no effect on djgpp, executables are always non-PIE.

Ok, Is it due to the COFF32 executable format ?

Now for the segfault, the question is who is to blame: gcc or libc? I asked this on the gcc-help list, but no replies. Testing with godbolt.org reveals that gcc has done stack alignment in main() since version 4.1.x, so the bug has been present since then. I'm surprised no one has noticed it before.

When you are lucky enough to clone a git tree during a work in progress, finding a 18 years old bug is quiet natural. ;)

I have found a trace of modification in a file called mcount.c on DJGPP history. It sims to be the origine of some bug report about problem to use Gprof with DJGPP. That why in the DJGPP FAQ it's adviced to use V2.3. When we use your script, we can find "mcount.c" here "./download/djgpp-cvs/src/libc/crt0/mcount.c" The right door to knock to get infos is perhaps DJGPP mailing list.

You will find in the zip file a Quick and Dirty test to wrap function defined on mcount.c, if it help to debug. gmon-wrapper.zip

jwt27 commented 2 weeks ago

The profiler uses a 55ms tick rate (hooks IRQ 0), your test program exits before it can gather any data. On a longer-running program, you will see results.

OK, I begun to look closer to this, my final work use a ~1ms interupt to make thread scheduling and other things, there will be a clash somewhere later.

You can reprogram the timer after SIGPROF has hooked it, I think that should be okay. Note that gprof reports execution time in ticks, not in seconds.

BTW, the pie/no-pie options have no effect on djgpp, executables are always non-PIE.

Ok, Is it due to the COFF32 executable format ?

Correct, it doesn't support PIC/PIE.

Now for the segfault, the question is who is to blame: gcc or libc? I asked this on the gcc-help list, but no replies. Testing with godbolt.org reveals that gcc has done stack alignment in main() since version 4.1.x, so the bug has been present since then. I'm surprised no one has noticed it before.

When you are lucky enough to clone a git tree during a work in progress, finding a 18 years old bug is quiet natural. ;)

I have found a trace of modification in a file called mcount.c on DJGPP history. It sims to be the origine of some bug report about problem to use Gprof with DJGPP. That why in the DJGPP FAQ it's adviced to use V2.3. When we use your script, we can find "mcount.c" here "./download/djgpp-cvs/src/libc/crt0/mcount.c" The right door to knock to get infos is perhaps DJGPP mailing list.

I've sent a message to the list. If nothing comes of that, I can patch it in the build scripts here.

You will find in the zip file a Quick and Dirty test to wrap function defined on mcount.c, if it help to debug. gmon-wrapper.zip

Using the linker's --wrap option, you could work around the problem locally for now:

void __real_mcount (int);

__attribute((no_caller_saved_registers, target("general-regs-only"), no_instrument_function))
void __wrap_mcount (int _to)
{
  __real_mcount (_to);
}

Edit: nevermind, that wouldn't work since the address of _to is used, not its value :) In fact, fixing this might get complicated, as I see mcount assumes a fixed-size function prologue and specific stack frame layout.

jwt27 commented 2 weeks ago

Draft patch: https://github.com/jwt27/djgpp-cvs/pull/15

jwt27 commented 2 weeks ago

run on target and get back generated "GMON.OUT", (size 42ko) back in debian :

i386-pc-msdosdjgpp-gprof buildv.exe

Command doesn't talk about "printf()", "foo1()" and "foo2()" call.

OK, I was mistaken - the missing call graph is not due to short execution time, it is in fact a bug in gprof/binutils.

Patched it locally, then I get:

$ i386-pc-msdosdjgpp-gprof -b buildv.exe
Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00     1100     0.00     0.00  _flsbuf
  0.00      0.00     0.00     1067     0.00     0.00  putc
  0.00      0.00     0.00       33     0.00     0.00  __FSEXT_get_function
  0.00      0.00     0.00       33     0.00     0.00  __dosmemput
  0.00      0.00     0.00       33     0.00     0.00  _doprnt
  0.00      0.00     0.00       33     0.00     0.00  _write
  0.00      0.00     0.00       33     0.00     0.00  _write_int
  0.00      0.00     0.00       33     0.00     0.00  fflush
  0.00      0.00     0.00       33     0.00     0.00  localeconv
  0.00      0.00     0.00       33     0.00     0.00  printf
  0.00      0.00     0.00       29     0.00     0.00  strlen
  0.00      0.00     0.00        1     0.00     0.00  __isatty
  0.00      0.00     0.00        1     0.00     0.00  __main
  0.00      0.00     0.00        1     0.00     0.00  __nmalloc_split
  0.00      0.00     0.00        1     0.00     0.00  _get_dev_info
  0.00      0.00     0.00        1     0.00     0.00  exit
  0.00      0.00     0.00        1     0.00     0.00  extendsbrk
  0.00      0.00     0.00        1     0.00     0.00  foo1
  0.00      0.00     0.00        1     0.00     0.00  foo2
  0.00      0.00     0.00        1     0.00     0.00  main
  0.00      0.00     0.00        1     0.00     0.00  malloc
  0.00      0.00     0.00        1     0.00     0.00  mv2freelist

            Call graph

granularity: each sample hit covers 4 byte(s) no time propagated

index % time    self  children    called     name
                0.00    0.00      33/1100        putc [4]
                0.00    0.00    1067/1100        _doprnt [7]
[3]      0.0    0.00    0.00    1100         _flsbuf [3]
                0.00    0.00      33/33          _write [8]
                0.00    0.00       1/1           malloc [23]
                0.00    0.00       1/1           __isatty [14]
-----------------------------------------------
                0.00    0.00    1067/1067        _doprnt [7]
[4]      0.0    0.00    0.00    1067         putc [4]
                0.00    0.00      33/1100        _flsbuf [3]
-----------------------------------------------
                0.00    0.00      33/33          _write [8]
[5]      0.0    0.00    0.00      33         __FSEXT_get_function [5]
-----------------------------------------------
                0.00    0.00      33/33          _write_int [9]
[6]      0.0    0.00    0.00      33         __dosmemput [6]
-----------------------------------------------
                0.00    0.00      33/33          printf [12]
[7]      0.0    0.00    0.00      33         _doprnt [7]
                0.00    0.00    1067/1067        putc [4]
                0.00    0.00    1067/1100        _flsbuf [3]
                0.00    0.00      33/33          localeconv [11]
                0.00    0.00      29/29          strlen [13]
-----------------------------------------------
                0.00    0.00      33/33          _flsbuf [3]
[8]      0.0    0.00    0.00      33         _write [8]
                0.00    0.00      33/33          __FSEXT_get_function [5]
                0.00    0.00      33/33          _write_int [9]
-----------------------------------------------
                0.00    0.00      33/33          _write [8]
[9]      0.0    0.00    0.00      33         _write_int [9]
                0.00    0.00      33/33          __dosmemput [6]
-----------------------------------------------
                0.00    0.00      33/33          printf [12]
[10]     0.0    0.00    0.00      33         fflush [10]
-----------------------------------------------
                0.00    0.00      33/33          _doprnt [7]
[11]     0.0    0.00    0.00      33         localeconv [11]
-----------------------------------------------
                0.00    0.00      33/33          main [22]
[12]     0.0    0.00    0.00      33         printf [12]
                0.00    0.00      33/33          _doprnt [7]
                0.00    0.00      33/33          fflush [10]
-----------------------------------------------
                0.00    0.00      29/29          _doprnt [7]
[13]     0.0    0.00    0.00      29         strlen [13]
-----------------------------------------------
                0.00    0.00       1/1           _flsbuf [3]
[14]     0.0    0.00    0.00       1         __isatty [14]
                0.00    0.00       1/1           _get_dev_info [17]
-----------------------------------------------
                0.00    0.00       1/1           __crt1_startup [62]
[15]     0.0    0.00    0.00       1         __main [15]
-----------------------------------------------
                0.00    0.00       1/1           malloc [23]
[16]     0.0    0.00    0.00       1         __nmalloc_split [16]
-----------------------------------------------
                0.00    0.00       1/1           __isatty [14]
[17]     0.0    0.00    0.00       1         _get_dev_info [17]
-----------------------------------------------
                0.00    0.00       1/1           __crt1_startup [62]
[18]     0.0    0.00    0.00       1         exit [18]
-----------------------------------------------
                0.00    0.00       1/1           malloc [23]
[19]     0.0    0.00    0.00       1         extendsbrk [19]
                0.00    0.00       1/1           mv2freelist [24]
-----------------------------------------------
                0.00    0.00       1/1           main [22]
[20]     0.0    0.00    0.00       1         foo1 [20]
-----------------------------------------------
                0.00    0.00       1/1           main [22]
[21]     0.0    0.00    0.00       1         foo2 [21]
-----------------------------------------------
                0.00    0.00       1/1           __crt1_startup [62]
[22]     0.0    0.00    0.00       1         main [22]
                0.00    0.00      33/33          printf [12]
                0.00    0.00       1/1           foo1 [20]
                0.00    0.00       1/1           foo2 [21]
-----------------------------------------------
                0.00    0.00       1/1           _flsbuf [3]
[23]     0.0    0.00    0.00       1         malloc [23]
                0.00    0.00       1/1           extendsbrk [19]
                0.00    0.00       1/1           __nmalloc_split [16]
-----------------------------------------------
                0.00    0.00       1/1           extendsbrk [19]
[24]     0.0    0.00    0.00       1         mv2freelist [24]
-----------------------------------------------

Index by function name

   [5] __FSEXT_get_function    [8] _write                 [22] main
   [6] __dosmemput             [9] _write_int             [23] malloc
  [14] __isatty               [18] exit                   [24] mv2freelist (nmalloc.c)
  [15] __main                 [19] extendsbrk (nmalloc.c) [12] printf
  [16] __nmalloc_split        [10] fflush                  [4] putc
   [7] _doprnt                [20] foo1                   [13] strlen
   [3] _flsbuf                [21] foo2
  [17] _get_dev_info          [11] localeconv
pif17 commented 2 weeks ago

Ok, I've tried to build a toolchain with "jwt27-mcount" branch, it didn't work. Installer can't find "hostbin/dxe3res.exe" when installing djgpp libraries and utilities. (To try, I changed DJGPP_GIT="https://github.com/jwt27/djgpp-cvs.git::jwt27" to DJGPP_GIT="https://github.com/jwt27/djgpp-cvs.git::jwt27-mcount" on a copy of djgpp/djgpp-cvs called djgpp/djgpp-cvs-mcount and then ./build-djgpp.sh --prefix=/home/tst-djgpp/DJGPPv3 djgpp-cvs-mcount binutils gcc

Your Gprof output is like what I'm looking for.

jwt27 commented 2 weeks ago

The patch was based off master which is still missing some patches required for cross-build. Rebased it now.

Binutils needs the following patch (bug report here):

--- a/gprof/corefile.c
+++ b/gprof/corefile.c
@@ -605,6 +605,9 @@ core_create_function_syms (void)
     default:
       break;
     case bfd_target_coff_flavour:
+      if (startswith (core_bfd->xvec->name, "coff-go32"))
+   break;
+      /* FALLTHROUGH */
     case bfd_target_ecoff_flavour:
     case bfd_target_xcoff_flavour:
     case bfd_target_elf_flavour:

And to get the complete call graph, libc is compiled with -pg as well. Otherwise you will only see your own functions: main/foo1/foo2. Quick hack to make that work:

--- a/build-djgpp.sh
+++ b/build-djgpp.sh
@@ -265,6 +265,7 @@ if [ ! -z ${DJGPP_VERSION} ]; then
   export LDFLAGS="$LDFLAGS_FOR_TARGET"
   sed -i 's/Werror/Wno-error/' makefile.cfg
   ${MAKE} config || exit 1
+  echo '-pg' >> gcc.opt
   echo "${TARGET}:${DST}:${CFLAGS_FOR_TARGET}" > configure-options
   ${MAKE_J} -C mkdoc || exit 1
   ${MAKE_J} -C libc || exit 1
jwt27 commented 2 weeks ago

Seeing a strange bug after rebasing. The histogram is clobbered with special value NONE used in nmalloc.c. In the profile this appears as if the program spent a long time in start.

pif17 commented 2 weeks ago

ok, "ftpmirror.gnu.org" is down for the moment, I'll test when it will be back.