namhyung / uftrace

Function graph tracer for C/C++/Rust/Python
https://uftrace.github.io/slide/
GNU General Public License v2.0
3.06k stars 473 forks source link

python: `sys.exit()` with importing `asyncio` makes aborted #1752

Open honggyukim opened 1 year ago

honggyukim commented 1 year ago

The example program is as follows.

$ cat async_exit.py 
#!/usr/bin/env python3
import asyncio
import sys

sys.exit(0)

It gets aborted when tracing with uftrace as follows.

$ uftrace async_exit.py 
munmap_chunk(): invalid pointer
WARN: child terminated by signal: 6: Aborted
# DURATION     TID     FUNCTION
            [ 23331] | <1>() {
  92.314 ms [ 23331] |   <2>();
   0.530 us [ 23331] |   <3560>();
  92.325 ms [ 23331] | } /* <1> */

But there is no problem calling sys.exit without importing asyncio.

$ cat exit.py 
#!/usr/bin/env python3
#import asyncio
import sys

sys.exit(0)
$ uftrace exit.py 
# DURATION     TID     FUNCTION
            [ 23456] | __main__.<module>() {
   0.404 us [ 23456] |   sys.exit();
   5.275 us [ 23456] | } /* __main__.<module> */
honggyukim commented 1 year ago

The backtrace from gdb is as follows.

$ gdb -q --args uftrace record async_exit.py 
Reading symbols from uftrace...
(gdb) r
    ...
munmap_chunk(): invalid pointer

Thread 2.1 "python3" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff7ccf840 (LWP 24517)]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737350793280) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737350793280) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737350793280, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7a42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7a287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7a896f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7bdbb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007ffff7aa0d7c in malloc_printerr (str=str@entry=0x7ffff7bde230 "munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5664
#7  0x00007ffff7aa105c in munmap_chunk (p=<optimized out>) at ./malloc/malloc.c:3060
#8  0x00007ffff7aa551a in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3381
#9  0x00007ffff65aca00 in OPENSSL_LH_free () from /lib/x86_64-linux-gnu/libcrypto.so.3
#10 0x00007ffff65bfa5f in OPENSSL_cleanup () from /lib/x86_64-linux-gnu/libcrypto.so.3
#11 0x00007ffff7a45495 in __run_exit_handlers (status=0, listp=0x7ffff7c19838 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:113
#12 0x00007ffff7a45610 in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:143
#13 0x00007ffff7a29d97 in __libc_start_call_main (main=main@entry=0x55555577eb90, argc=argc@entry=4, argv=argv@entry=0x7fffffffdac8) at ../sysdeps/nptl/libc_start_call_main.h:74
#14 0x00007ffff7a29e40 in __libc_start_main_impl (main=0x55555577eb90, argc=4, argv=0x7fffffffdac8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdab8) at ../csu/libc-start.c:392
#15 0x000055555577eac5 in _start ()
honggyukim commented 1 year ago

I see that this problem disappears if 953cef1fd197c37c5d2d103361fe94cb9cea857d, which is from #1725, is reverted.

honggyukim commented 1 year ago

Hmm.. it might be related some unknown issue. The 953cef1 might not be a real problem. The problem disappears when changing the uftrace execution directory is changed.

honggyukim commented 1 year ago

Can anyone reproduce this problem? Otherwise, it might just be a tricky environment issue in my machine.

MichelleJin12 commented 1 year ago

This issue occurred in a Python venv environment with the problem https://github.com/namhyung/uftrace/pull/1725#issuecomment-1577978923.

(venv4) $ uftrace async_exit.py 
munmap_chunk(): invalid pointer
WARN: child terminated by signal: 6: Aborted
# DURATION     TID     FUNCTION
            [  9733] | <1>() {
 112.302 ms [  9733] |   <2>();
   0.625 us [  9733] |   <3540>();
 112.315 ms [  9733] | } /* <1> */

It could be a Python package dependency issue. Here is the list of Python packages for the venv.

#requirements.txt
altgraph==0.17.3
contourpy==1.1.0
cycler==0.11.0
fonttools==4.40.0
kiwisolver==1.4.4
matplotlib==3.7.1
numpy==1.24.3
packaging==23.1
pandas==2.0.2
Pillow==9.5.0
pyinstaller==5.13.0
pyinstaller-hooks-contrib==2023.5
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2023.3
six==1.16.0
tzdata==2023.3

I can reproduce the problem with the commands below.

$ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install -r requirements.txt
$ uftrace async_exit.py

Other Python environments do not have this issue.

(venv3) $ uftrace async_exit.py
# DURATION     TID     FUNCTION
            [  7157] | __main__.<module>() {
 116.136 ms [  7157] |   importlib._bootstrap._find_and_load();
   0.513 us [  7157] |   sys.exit();
 116.153 ms [  7157] | } /* __main__.<module> */
honggyukim commented 1 year ago

@MichelleJin12 Thanks a lot for the analysis. I found that installing matplotlib makes the problem. The following requirements.txt installation doesn't show the problem.

altgraph==0.17.3
contourpy==1.1.0
cycler==0.11.0
fonttools==4.40.0
kiwisolver==1.4.4
#matplotlib==3.7.1
numpy==1.24.3
packaging==23.1
pandas==2.0.2
Pillow==9.5.0
pyinstaller==5.13.0
pyinstaller-hooks-contrib==2023.5
pyparsing==3.0.9
python-dateutil==2.8.2
pytz==2023.3
six==1.16.0
tzdata==2023.3
honggyukim commented 1 year ago

The following requirements.txt file makes the problem. We need to investigate why it makes the problem.

matplotlib==3.7.1
gichoel commented 1 year ago

My test environment looks like this. (An ARM version of Ubuntu running as a virtual machine on a MacBook M2)

$ lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy

$ uname -a
Linux ubuntu-test 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:21:56 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux

Here's a list of the Python venv packages in my environment.

$ pip3 list

Package         Version
--------------- -------
contourpy       1.1.0
cycler          0.11.0
fonttools       4.41.1
kiwisolver      1.4.4
matplotlib      3.7.1
numpy           1.24.3
packaging       23.1
Pillow          9.5.0
pip             22.0.2
pyparsing       3.0.9
python-dateutil 2.8.2
setuptools      59.6.0
six             1.16.0

In Ubuntu 22.04 Arm version, even if matplotlib is installed, it worked fine as below.

$ uftrace asyncio_exit.py

# DURATION     TID     FUNCTION
            [ 21468] | __main__.<module>() {
  61.808 ms [ 21468] |   importlib._bootstrap._find_and_load();
   0.292 us [ 21468] |   sys.exit();
  61.816 ms [ 21468] | } /* __main__.<module> */

However I was able to reproduce the same problem as above when matplotlib was installed on Ubuntu in an x86 environment.

I approached it thinking that the problem was caused by a different version of OpenSSL being called by Python, but I found that OpenSSL 3.0.2 15 March 2022 was the same.

Can anyone reproduce the same results as me on aarch64 arch?

3godhalmang commented 1 year ago

environment

$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.2 LTS
Release:        22.04
Codename:       jammy
$ uname -a
Linux rusalka 5.15.0-76-generic #83-Ubuntu SMP Thu Jun 15 19:21:56 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
$ pip3 list
Package                   Version
------------------------- -------
altgraph                  0.17.3
contourpy                 1.1.0
cycler                    0.11.0
fonttools                 4.40.0
kiwisolver                1.4.4
numpy                     1.24.3
packaging                 23.1
pandas                    2.0.2
Pillow                    9.5.0
pip                       22.0.2
pyinstaller               5.13.0
pyinstaller-hooks-contrib 2023.5
pyparsing                 3.0.9
python-dateutil           2.8.2
pytz                      2023.3
setuptools                59.6.0
six                       1.16.0
tzdata                    2023.3

result

$ uftrace /home/winpsho/uftrace/tests/async_exit.py
    # DURATION     TID     FUNCTION
    [296717] | __main__.<module>() {
    85.880 ms [296717] |   importlib._bootstrap._find_and_load();
    0.625 us [296717] |   sys.exit();
    85.892 ms [296717] | } /* __main__.<module> */

I got same result. It works without problems.

honggyukim commented 9 months ago

Now, I see that this problem is fixed by a9e878277b4025a7107fdee8885aa30142923730.

$ uftrace ./async_exit.py
WARN: unpaired cygprof exit: dropping...
# DURATION     TID     FUNCTION
            [ 41753] | __main__.<module>() {
 102.347 ms [ 41753] |   importlib._bootstrap._find_and_load();
   0.794 us [ 41753] |   sys.exit();
 102.361 ms [ 41753] | } /* __main__.<module> */
  45.775 us [ 41753] | threading._shutdown();
  50.603 us [ 41753] | logging.shutdown();
   3.614 us [ 41753] | _weakrefset._remove();

I'm not sure if we really have to print WARN: unpaired cygprof exit: dropping... message though.

@namhyung Do you think this is enough fix for this issue?

namhyung commented 9 months ago

Hmm.. I thought it'd be rare. If not, we might change it to pr_dbg().

honggyukim commented 9 months ago

I see it quite often when tracing python these days. I think better to change it to pr_dbg.