vivier / qemu-m68k

Other
40 stars 6 forks source link

Issues when building some packages with OpenMP enabled #18

Open glaubitz opened 7 years ago

glaubitz commented 7 years ago

Several packages in Debian like imagemagick or gettext utilize OpenMP to parallize execution and improve performance.

While OpenMP generally works on qemu-m68k (tested with some examples from [1]), qemu-m68k can lock up in user mode when building imagemagick or gettext with the --enable-openmp configure parameter.

For example, the imagemagick build gets stuck when running convert to create some PNG icons from the SVG icons in the source tree [2]:

Make icons for size 8x8... E: Caught signal ‘Terminated’: terminating immediately debian/rules:369: recipe for target 'override_dh_auto_install-arch_iconcache_quantum.q16' failed make[1]: *** [override_dh_auto_install-arch_iconcache_quantum.q16] Terminated debian/rules:189: recipe for target 'binary-arch' failed make: *** [binary-arch] Terminated E: Build killed with signal TERM after 60 minutes of inactivity

This issues goes away immediately when disabling OpenMP support per configure option.

With gettext, the problem occurs in msgmerge which is part of gettext and used when building other packages like apt [3]:

Generating ../build/po/domains/apt/bg.po echo ../build/po/domains/apt/bg.po : bg.po ../build/po/apt.pot > ../build/po/apt_bg.po.d /usr/bin//msgmerge --add-location=file bg.po ../build/po/apt.pot -o ../build/po/domains/apt/bg.po .........make[2]: *** wait: No child processes. Stop. make[2]: *** Waiting for unfinished jobs.... make[2]: *** wait: No child processes. Stop. make[1]: *** wait: No child processes. Stop. make[1]: *** Waiting for unfinished jobs.... make[1]: *** wait: No child processes. Stop. make: *** wait: No child processes. Stop. make: *** Waiting for unfinished jobs.... make: *** wait: No child processes. Stop. Build killed with signal TERM after 30 minutes of inactivity

Again, the problem goes away the moment we build gettext without OpenMP support.

Unfortunately, I have not yet figured out exactly which OpenMP directive is causing the issues since the basic OpenMP examples from [1] don't cause any problems. But I presume it's an issue with when using atomic/critical sections in OpenMP.

[1] https://computing.llnl.gov/tutorials/openMP/exercise.html [2] https://buildd.debian.org/status/fetch.php?pkg=imagemagick&arch=m68k&ver=8%3A6.9.6.2%2Bdfsg-2&stamp=1476373630 [3] https://buildd.debian.org/status/fetch.php?pkg=apt&arch=m68k&ver=1.2.11&stamp=1461880964

baryluk commented 4 years ago

@glaubitz FYI. As of now, imagemagick 6.9.10.23+dfsg does build fine for me, even with -fopenmp used by compiler.

I did notice a single warning from the kernel, during compilation, but really it could be just emulator being preempted for a moment by other tasks (I was probably compiling some other stuff with too many thread in other vms or on the host system).

...
libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I../.. -I./config -I. -I../.. -I../../Magick++/lib -Wdate-time -D_FORTIFY_SOURCE=2 -DMAGICKCORE_HDRI_ENABLE=0 -DMAGICKCORE_QUANTUM_DEPTH=16 -I/usr/include/X11 -g -O2 -fdebug-prefix-map=/root/imagemagick-6.9.10.23+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -pthread -c ../../Magick++/lib/Image.cpp  -fPIC -DPIC -o Magick++/lib/.libs/libMagick___6_Q16_la-Image.o
[ 5708.060000] watchdog: BUG: soft lockup - CPU#0 stuck for 5107s! [cc1plus:12887]
[ 5708.060000] Modules linked in: sg evdev mac_hid ip_tables x_tables sha1_generic hmac ipv6 nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod mac_esp macsonic esp_scsi
[ 5708.060000] Format 00  Vector: 0064  PC: 80651714  Status: 0004    Not tainted
[ 5708.060000] ORIG_D0: ffffffff  D0: 00000000  A2: c08b8460  A1: efd3a344
[ 5708.060000] A0: c08b8460  D5: 00000022  D4: 00000022
[ 5708.060000] D3: 00000022  D2: 00000006  D1: 00000000
[ 5708.060000] USP: efd3a1e8

Message from syslogd@debian at Oct 25 18:44:56 ...
 kernel:[ 5708.060000] watchdog: BUG: soft lockup - CPU#0 stuck for 5107s! [cc1plus:12887]

Beyond that, and few probably harmless warnings from compiler about using long double incorrectly in printf statements (should use %Lg, not just %g), it passes builds fine, all tests are passing, and it looks all right.

baryluk commented 4 years ago

Ok, I found one more issue when building it, but it is not really related OpenMP anyway:

...
make[5]: Entering directory '/root/imagemagick-6.9.10.23+dfsg/debian/build-quantum-q16hdri'
PASS: tests/cli-pipe.tap 1
PASS: tests/cli-pipe.tap 2
PASS: tests/cli-pipe.tap 3
PASS: tests/cli-pipe.tap 4
PASS: tests/cli-pipe.tap 5
PASS: tests/cli-pipe.tap 6
PASS: tests/cli-pipe.tap 7
PASS: tests/cli-pipe.tap 8
[13398.000000] *** ADDRESS ERROR ***   FORMAT=2
[13398.000000] Current process id is 14103
[13398.000000] BAD KERNEL TRAP: 00000000
[13398.000000] Modules linked in: sg evdev mac_hid ip_tables x_tables sha1_generic hmac ipv6 nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod mac_esp macsonic esp_scsi
[13398.000000] PC: [<00016974>] X_UNSUPP+0x2c/0x3c
[13398.000000] SR: 2014  SP: a0cb9162  a2: efe3de24
[13398.000000] d0: 00000040    d1: 00000000    d2: c02579da    d3: efe3dd64
[13398.000000] d4: 00000000    d5: efe3e434    a0: 800240e0    a1: 8002a3a0
[13398.000000] Process convert (pid: 14103, task=7fd38eb2)
[13398.000000] Frame format=2 instr addr=00000000
[13398.000000] Stack from 3de41f30:
[13398.000000]         41000000 00000000 00000000 800240e0 8002a3a0 400e0000 92480342 76efc7c4
[13398.000000]         400d0000 b337c488 1bfc2fd8 400d0000 a190b7cf a9802919 00000000 00000000
[13398.000000]         00000000 00000000 00050008 00000000 49f50780 000e624a 49f7f620 3de98500
[13398.000000]         00000001 3de98500 000e6264 3de98500 00000001 000e2184 00002a60 3de41f41
[13398.000000]         c0240632 8001be38 c001db4e 800ba488 c0583000 efe447fc 000029e4 00690c80
[13398.000000]         00000000 00000000 00000000 800ba3c0 c046dfdc 8002f278 800ba3c0 c2d39af0
[13398.000000] Call Trace: [<00050008>] pm_qos_debug_show+0x2e/0x15a
[13398.010000]  [<000e624a>] fput_many+0x80/0x8c
[13398.010000]  [<000e6264>] fput+0xe/0x12
[13398.010000]  [<000e2184>] filp_close+0x68/0x70
[13398.010000]  [<00002a60>] do_signal_return+0x10/0x1a
[13398.010000]  [<000029e4>] syscall+0x8/0xc
[13398.010000]  [<0014c018>] debugfs_lookup+0x2a/0x5a
[13398.010000] Code: 1017 0200 00f0 0c00 0040 66ff 0000 05ac <f23c> 8800 0000 0000 f23c 9000 0000 0000 222e ff84 082e 0005 ff1c 6600 000a 0281
[13398.010000] Disabling lock debugging due to kernel taint
[13398.380000] *** ADDRESS ERROR ***   FORMAT=2
[13398.380000] Current process id is 14118
[13398.380000] BAD KERNEL TRAP: 00000000
[13398.380000] Modules linked in: sg evdev mac_hid ip_tables x_tables sha1_generic hmac ipv6 nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod mac_esp macsonic esp_scsi
[13398.380000] PC: [<00016974>] X_UNSUPP+0x2c/0x3c
[13398.380000] SR: 2014  SP: d84c5cdd  a2: efe0ae14
[13398.380000] d0: 00000040    d1: 00000000    d2: c02579da    d3: efe0ad54
[13398.380000] d4: 00000000    d5: efe0b424    a0: 800c1840    a1: 8002a3a0
[13398.380000] Process convert (pid: 14118, task=e8499edc)
[13398.380000] Frame format=2 instr addr=00000000
[13398.380000] Stack from 3deb3f30:
[13398.380000]         41000000 00000000 00000000 800c1840 8002a3a0 400d0000 92886897 e0dc83f6
[13398.380000]         400b0000 cdd4fe52 ae53c161 400b0000 a642289f e82f7c15 00000000 00000000
[13398.380000]         00000000 00000000 00050008 00000000 49f50780 000e624a 49f7b480 3de90dc0
[13398.380000]         00000001 3de90dc0 000e6264 3de90dc0 00000001 000e2184 00002a60 3deb3f41
[13398.380000]         c0240632 8001be38 c001db4e 80034838 c0583000 efe117ec 000029e4 0093031e
[13398.380000]         00000000 00000000 00000000 800bd620 c046dfdc 800c1938 800bd620 c2d39af0
[13398.380000] Call Trace: [<00050008>] pm_qos_debug_show+0x2e/0x15a
[13398.380000]  [<000e624a>] fput_many+0x80/0x8c
[13398.380000]  [<000e6264>] fput+0xe/0x12
[13398.380000]  [<000e2184>] filp_close+0x68/0x70
[13398.380000]  [<00002a60>] do_signal_return+0x10/0x1a
[13398.380000]  [<000029e4>] syscall+0x8/0xc
[13398.380000]  [<0014c018>] debugfs_lookup+0x2a/0x5a
[13398.380000] Code: 1017 0200 00f0 0c00 0040 66ff 0000 05ac <f23c> 8800 0000 0000 f23c 9000 0000 0000 222e ff84 082e 0005 ff1c 6600 000a 0281
ERROR: tests/cli-colorspace.tap - too few tests run (expected 19, got 0)
ERROR: tests/cli-colorspace.tap - exited with status 1
PASS: tests/validate-colorspace.tap 1
PASS: tests/validate-compare.tap 1
PASS: tests/validate-composite.tap 1
PASS: tests/validate-convert.tap 1
[13494.000000] *** ADDRESS ERROR ***   FORMAT=2
[13494.000000] Current process id is 14298
[13494.000000] BAD KERNEL TRAP: 00000000
[13494.000000] Modules linked in: sg evdev mac_hid ip_tables x_tables sha1_generic hmac ipv6 nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod mac_esp macsonic esp_scsi
[13494.000000] PC: [<00016974>] X_UNSUPP+0x2c/0x3c
[13494.000000] SR: 2004  SP: ca88b409  a2: efbac7da
[13494.000000] d0: 800f8c40    d1: 00653ff3    d2: 00000008    d3: c3883595
[13494.000000] d4: 00000046    d5: 00000000    a0: 800f8c20    a1: 00000000
[13494.010000] Process validate (pid: 14298, task=d23a09de)
[13494.010000] Frame format=2 instr addr=00000000
[13494.010000] Stack from 48f1bf30:
[13494.010000]         41000000 800f8c20 00653ff3 800f8c20 00000000 40060000 ff000000 00000000
[13494.010000]         40060000 ff000000 00000000 00000000 00000000 00000000 7fff0000 ffffffff
[13494.010000]         ffffffff 00000000 00050008 00000000 3deb76e0 800f6b00 800f8c20 000e5500
[13494.010000]         3deb76e0 800f8c20 00000d20 48f1bfc0 800f8c20 00000d20 00000000 00000041
[13494.010000]         000e5548 00000005 800f8c20 00000d20 00000000 00000d20 000029e4 00000002
[13494.010000]         8001a6e0 00000002 00000000 7ffff2de c04aefdc 00000000 8001a6e0 c2d39bf0
[13494.010000] Call Trace: [<00050008>] pm_qos_debug_show+0x2e/0x15a
[13494.010000]  [<000e5500>] ksys_pread64+0x40/0x6e
[13494.010000]  [<000e5548>] sys_pread64+0x1a/0x20
[13494.010000]  [<000029e4>] syscall+0x8/0xc
[13494.010000]  [<0004c386>] add_wait_queue_exclusive+0x36/0x48
[13494.010000] Code: 1017 0200 00f0 0c00 0040 66ff 0000 05ac <f23c> 8800 0000 0000 f23c 9000 0000 0000 222e ff84 082e 0005 ff1c 6600 000a 0281
ERROR: tests/validate-formats-disk.tap - too few tests run (expected 1, got 0)
...
glaubitz commented 4 years ago

My tests were on qemu-user, not qemu-system. So you have to be careful when comparing these, especially when it comes to anything involving atomics.

There are packages like firebird that build without problems on qemu-system but fails on qemu-user.

FWIW, when this bug report was opened, there was no qemu-system that could be used with Debian/m68k which is why the title doesn't mention qemu-user.

vivier commented 4 years ago

Ok, I found one more issue when building it, but it is not really related OpenMP anyway:

...
make[5]: Entering directory '/root/imagemagick-6.9.10.23+dfsg/debian/build-quantum-q16hdri'
PASS: tests/cli-pipe.tap 1
PASS: tests/cli-pipe.tap 2
PASS: tests/cli-pipe.tap 3
PASS: tests/cli-pipe.tap 4
PASS: tests/cli-pipe.tap 5
PASS: tests/cli-pipe.tap 6
PASS: tests/cli-pipe.tap 7
PASS: tests/cli-pipe.tap 8
[13398.000000] *** ADDRESS ERROR ***   FORMAT=2
[13398.000000] Current process id is 14103
[13398.000000] BAD KERNEL TRAP: 00000000
[13398.000000] Modules linked in: sg evdev mac_hid ip_tables x_tables sha1_generic hmac ipv6 nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod mac_esp macsonic esp_scsi
[13398.000000] PC: [<00016974>] X_UNSUPP+0x2c/0x3c
[13398.000000] SR: 2014  SP: a0cb9162  a2: efe3de24
[13398.000000] d0: 00000040    d1: 00000000    d2: c02579da    d3: efe3dd64
[13398.000000] d4: 00000000    d5: efe3e434    a0: 800240e0    a1: 8002a3a0
[13398.000000] Process convert (pid: 14103, task=7fd38eb2)
[13398.000000] Frame format=2 instr addr=00000000
[13398.000000] Stack from 3de41f30:
[13398.000000]         41000000 00000000 00000000 800240e0 8002a3a0 400e0000 92480342 76efc7c4
[13398.000000]         400d0000 b337c488 1bfc2fd8 400d0000 a190b7cf a9802919 00000000 00000000
[13398.000000]         00000000 00000000 00050008 00000000 49f50780 000e624a 49f7f620 3de98500
[13398.000000]         00000001 3de98500 000e6264 3de98500 00000001 000e2184 00002a60 3de41f41
[13398.000000]         c0240632 8001be38 c001db4e 800ba488 c0583000 efe447fc 000029e4 00690c80
[13398.000000]         00000000 00000000 00000000 800ba3c0 c046dfdc 8002f278 800ba3c0 c2d39af0
[13398.000000] Call Trace: [<00050008>] pm_qos_debug_show+0x2e/0x15a
[13398.010000]  [<000e624a>] fput_many+0x80/0x8c
[13398.010000]  [<000e6264>] fput+0xe/0x12
[13398.010000]  [<000e2184>] filp_close+0x68/0x70
[13398.010000]  [<00002a60>] do_signal_return+0x10/0x1a
[13398.010000]  [<000029e4>] syscall+0x8/0xc
[13398.010000]  [<0014c018>] debugfs_lookup+0x2a/0x5a
[13398.010000] Code: 1017 0200 00f0 0c00 0040 66ff 0000 05ac <f23c> 8800 0000 0000 f23c 9000 0000 0000 222e ff84 082e 0005 ff1c 6600 000a 0281
[13398.010000] Disabling lock debugging due to kernel taint
[13398.380000] *** ADDRESS ERROR ***   FORMAT=2
[13398.380000] Current process id is 14118
[13398.380000] BAD KERNEL TRAP: 00000000
[13398.380000] Modules linked in: sg evdev mac_hid ip_tables x_tables sha1_generic hmac ipv6 nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod mac_esp macsonic esp_scsi
[13398.380000] PC: [<00016974>] X_UNSUPP+0x2c/0x3c
[13398.380000] SR: 2014  SP: d84c5cdd  a2: efe0ae14
[13398.380000] d0: 00000040    d1: 00000000    d2: c02579da    d3: efe0ad54
[13398.380000] d4: 00000000    d5: efe0b424    a0: 800c1840    a1: 8002a3a0
[13398.380000] Process convert (pid: 14118, task=e8499edc)
[13398.380000] Frame format=2 instr addr=00000000
[13398.380000] Stack from 3deb3f30:
[13398.380000]         41000000 00000000 00000000 800c1840 8002a3a0 400d0000 92886897 e0dc83f6
[13398.380000]         400b0000 cdd4fe52 ae53c161 400b0000 a642289f e82f7c15 00000000 00000000
[13398.380000]         00000000 00000000 00050008 00000000 49f50780 000e624a 49f7b480 3de90dc0
[13398.380000]         00000001 3de90dc0 000e6264 3de90dc0 00000001 000e2184 00002a60 3deb3f41
[13398.380000]         c0240632 8001be38 c001db4e 80034838 c0583000 efe117ec 000029e4 0093031e
[13398.380000]         00000000 00000000 00000000 800bd620 c046dfdc 800c1938 800bd620 c2d39af0
[13398.380000] Call Trace: [<00050008>] pm_qos_debug_show+0x2e/0x15a
[13398.380000]  [<000e624a>] fput_many+0x80/0x8c
[13398.380000]  [<000e6264>] fput+0xe/0x12
[13398.380000]  [<000e2184>] filp_close+0x68/0x70
[13398.380000]  [<00002a60>] do_signal_return+0x10/0x1a
[13398.380000]  [<000029e4>] syscall+0x8/0xc
[13398.380000]  [<0014c018>] debugfs_lookup+0x2a/0x5a
[13398.380000] Code: 1017 0200 00f0 0c00 0040 66ff 0000 05ac <f23c> 8800 0000 0000 f23c 9000 0000 0000 222e ff84 082e 0005 ff1c 6600 000a 0281
ERROR: tests/cli-colorspace.tap - too few tests run (expected 19, got 0)
ERROR: tests/cli-colorspace.tap - exited with status 1
PASS: tests/validate-colorspace.tap 1
PASS: tests/validate-compare.tap 1
PASS: tests/validate-composite.tap 1
PASS: tests/validate-convert.tap 1
[13494.000000] *** ADDRESS ERROR ***   FORMAT=2
[13494.000000] Current process id is 14298
[13494.000000] BAD KERNEL TRAP: 00000000
[13494.000000] Modules linked in: sg evdev mac_hid ip_tables x_tables sha1_generic hmac ipv6 nf_defrag_ipv6 autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod cdrom sd_mod mac_esp macsonic esp_scsi
[13494.000000] PC: [<00016974>] X_UNSUPP+0x2c/0x3c
[13494.000000] SR: 2004  SP: ca88b409  a2: efbac7da
[13494.000000] d0: 800f8c40    d1: 00653ff3    d2: 00000008    d3: c3883595
[13494.000000] d4: 00000046    d5: 00000000    a0: 800f8c20    a1: 00000000
[13494.010000] Process validate (pid: 14298, task=d23a09de)
[13494.010000] Frame format=2 instr addr=00000000
[13494.010000] Stack from 48f1bf30:
[13494.010000]         41000000 800f8c20 00653ff3 800f8c20 00000000 40060000 ff000000 00000000
[13494.010000]         40060000 ff000000 00000000 00000000 00000000 00000000 7fff0000 ffffffff
[13494.010000]         ffffffff 00000000 00050008 00000000 3deb76e0 800f6b00 800f8c20 000e5500
[13494.010000]         3deb76e0 800f8c20 00000d20 48f1bfc0 800f8c20 00000d20 00000000 00000041
[13494.010000]         000e5548 00000005 800f8c20 00000d20 00000000 00000d20 000029e4 00000002
[13494.010000]         8001a6e0 00000002 00000000 7ffff2de c04aefdc 00000000 8001a6e0 c2d39bf0
[13494.010000] Call Trace: [<00050008>] pm_qos_debug_show+0x2e/0x15a
[13494.010000]  [<000e5500>] ksys_pread64+0x40/0x6e
[13494.010000]  [<000e5548>] sys_pread64+0x1a/0x20
[13494.010000]  [<000029e4>] syscall+0x8/0xc
[13494.010000]  [<0004c386>] add_wait_queue_exclusive+0x36/0x48
[13494.010000] Code: 1017 0200 00f0 0c00 0040 66ff 0000 05ac <f23c> 8800 0000 0000 f23c 9000 0000 0000 222e ff84 082e 0005 ff1c 6600 000a 0281
ERROR: tests/validate-formats-disk.tap - too few tests run (expected 1, got 0)
...

I'm currently trying to find the cause of a very similar bug I'm able to reproduce. Something in the MMU emulation, I think.