shorepine / tulipcc

The Tulip Creative Computer - a portable Python synthesizer for music and graphics
https://tulip.computer/
Other
445 stars 29 forks source link

Tulip Desktop launches and then crashes with a SIGSEGV on Arch #230

Open hughrawlinson opened 5 months ago

hughrawlinson commented 5 months ago

I've built Tulip Desktop on Arch Linux, but when I run the binary it launches, creates a window, and exits with the following error:

*** stack smashing detected ***: terminated
'./build-standard/tulip/obj/tulip' terminated by signal SIGSEGV (Address boundary error)

Happy to run in any debug mode or with gdb or however's best to get you more info!

bwhitman commented 5 months ago

Dang yes, please do run under gdb, like gdb dev/tulip , run , bt when it crashes. Can you also share your version of Arch ?

hughrawlinson commented 5 months ago

Here's the trace from gdb:

*** stack smashing detected ***: terminated

Thread 3 "tulip" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff12006c0 (LWP 766576)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
Downloading source file /usr/src/debug/glibc/glibc/nptl/pthread_kill.c
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;                
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7b69eb3 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:78
#2  0x00007ffff7b11a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7af94c3 in __GI_abort () at abort.c:79
#4  0x00007ffff7afa354 in __libc_message_impl (
    fmt=fmt@entry=0x7ffff7c83161 "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132
#5  0x00007ffff7bf7259 in __GI___fortify_fail (msg=msg@entry=0x7ffff7c83179 "stack smashing detected")
    at fortify_fail.c:24
#6  0x00007ffff7bf8514 in __stack_chk_fail () at stack_chk_fail.c:24
#7  0x0000555555611289 in mp_execute_bytecode (code_state=0x7fffeb61aaa0, inject_exc=<optimized out>)
    at ../../micropython/py/vm.c:1487
#8  0x00005555555fd493 in fun_bc_call (self_in=<optimized out>, n_args=0, n_kw=0, args=0x2)
    at ../../micropython/py/objfun.c:330
#9  0x00005555555f3d87 in mp_call_function_n_kw (fun_in=fun_in@entry=0x7fffeb61aa80, 
    n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0)
--Type <RET> for more, q to quit, c to continue without paging--
    at ../../micropython/py/runtime.c:708
#10 0x00005555555f3dcd in mp_call_function_0 (fun=fun@entry=0x7fffeb61aa80)
    at ../../micropython/py/runtime.c:682
#11 0x00005555556e832c in parse_compile_execute (source=<optimized out>, 
    input_kind=input_kind@entry=MP_PARSE_FILE_INPUT, exec_flags=136)
    at ../../micropython/shared/runtime/pyexec.c:123
#12 0x00005555556e8510 in pyexec_frozen_module (name=name@entry=0x555555717764 "_boot.py", 
    allow_keyboard_interrupt=allow_keyboard_interrupt@entry=false)
    at ../../micropython/shared/runtime/pyexec.c:716
#13 0x00005555556ac6f8 in main_ (vargs=<optimized out>) at main.c:782
#14 0x00007ffff7b67ded in start_thread (arg=<optimized out>) at pthread_create.c:447
#15 0x00007ffff7beb0dc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

I have no idea how to map from that mp_execute_bytecode call into a python stacktrace, but if you know how I could get more useful info I'd be happy to grab it!

I'm on the current latest version of Arch - I did a full upgrade yesterday. I can check versions of specific packages if you need.

bwhitman commented 5 months ago

Yeah, i can just see that it's running _boot.py and getting mad about potential stack smashing. I can't tell if it's due to something in _boot.py or just the act of trying to exec code.

Since this works fine for me on Fedora and Ubuntu, I'm assuming this is some system stack-watcher on Arch in libc. Unfortunately i tried and failed a few times to get an Arch linux system booted on my mac via VMs. The only one I could boot is limited to 10GB and I couldn't run the gnome install. I could eventually get it going on a RPi. but I'm prioritizing some firmware bugs for the next couple of weeks for the launch.

What i would do to try to diagnose this:

stewlab commented 3 months ago
* Remove most of the code in `_boot.py`, maybe just leave a print(), and see if it still crashes

I was getting similar errors below after building and running Tulip Desktop in CentOS and Ubuntu, and following this advice provided a work around. I am sure that this is overkill, but at least I can launch the app now.

Segmentation fault info from gdb in CentOS

Thread 17 "tulip" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffda200640 (LWP 85790)]
0x000000000042a885 in mp_obj_str_get_str (self_in=self_in@entry=0x0)
    at ../../micropython/py/objstr.c:2370
2370        if (mp_obj_is_str_or_bytes(self_in)) {
Missing separate debuginfos, use: dnf debuginfo-install SDL2-2.26.0-1.el9.x86_64 bzip2-libs-1.0.8-8.el9.x86_64 cairo-1.17.4-7.el9.x86_64 dbus-libs-1.12.20-8.el9.x86_64 elfutils-libelf-0.191-4.el9.x86_64 expat-2.5.0-2.el9.x86_64 fontconfig-2.14.0-2.el9.x86_64 freetype-2.10.4-9.el9.x86_64 fribidi-1.0.10-6.el9.2.x86_64 glib2-2.68.4-15.el9.x86_64 glibc-2.34-113.el9.x86_64 graphite2-1.3.14-9.el9.x86_64 harfbuzz-2.7.4-10.el9.x86_64 libX11-1.7.0-9.el9.x86_64 libXau-1.0.9-8.el9.x86_64 libXext-1.3.4-8.el9.x86_64 libXrender-0.9.10-16.el9.x86_64 libblkid-2.37.4-18.el9.x86_64 libbrotli-1.0.9-6.el9.x86_64 libcap-2.48-9.el9.x86_64 libdatrie-0.2.13-4.el9.x86_64 libdecor-0.1.1-1.el9.x86_64 libedit-3.1-38.20210216cvs.el9.x86_64 libffi-3.4.2-8.el9.x86_64 libgcc-11.4.1-3.el9.x86_64 libgcrypt-1.10.0-10.el9.x86_64 libglvnd-1.3.4-1.el9.x86_64 libglvnd-egl-1.3.4-1.el9.x86_64 libglvnd-glx-1.3.4-1.el9.x86_64 libgpg-error-1.42-5.el9.x86_64 libmount-2.37.4-18.el9.x86_64 libpng-1.6.37-12.el9.x86_64 libselinux-3.6-1.el9.x86_64 libthai-0.1.28-8.el9.x86_64 libwayland-cursor-1.21.0-1.el9.x86_64 libwayland-egl-1.21.0-1.el9.x86_64 libwayland-server-1.21.0-1.el9.x86_64 libxcb-1.13.1-9.el9.x86_64 libxml2-2.9.13-6.el9.x86_64 libxshmfence-1.3-10.el9.x86_64 libzstd-1.5.1-2.el9.x86_64 llvm-libs-18.1.6-3.el9.x86_64 lz4-libs-1.9.3-5.el9.x86_64 mesa-dri-drivers-24.1.2-1.el9.x86_64 pango-1.48.7-3.el9.x86_64 pcre-8.44-4.el9.x86_64 pcre2-10.40-5.el9.x86_64 systemd-libs-252-38.el9.x86_64 zlib-1.2.11-41.el9.x86_64
(gdb) bt
#0  0x000000000042a885 in mp_obj_str_get_str (self_in=self_in@entry=0x0)
    at ../../micropython/py/objstr.c:2370
#1  0x0000000000440579 in lookup_path (path_in=0x0, path_out=path_out@entry=0x7fffda1ffb58)
    at ../../micropython/extmod/vfs.c:97
#2  0x0000000000440639 in mp_vfs_chdir (path_in=<optimized out>) at ../../micropython/extmod/vfs.c:327
#3  0x0000000000423cb6 in fun_builtin_1_call
    (self_in=0x593d50 <mp_vfs_chdir_obj>, n_args=<optimized out>, n_kw=<optimized out>, args=0x7fffd861ab10) at ../../micropython/py/objfun.c:68
#4  0x000000000041b999 in mp_call_function_n_kw
    (fun_in=0x593d50 <mp_vfs_chdir_obj>, n_args=1, n_kw=0, args=args@entry=0x7fffd861ab10)
    at ../../micropython/py/runtime.c:708
#5  0x00000000004346f6 in mp_execute_bytecode
    (code_state=code_state@entry=0x7fffd861aae0, inject_exc=inject_exc@entry=0x0)
    at ../../micropython/py/vm.c:957
#6  0x0000000000423f56 in fun_bc_call (self_in=0x7fffd861aac0, n_args=0, n_kw=0, args=0x0)
    at ../../micropython/py/objfun.c:273
#7  0x000000000041b999 in mp_call_function_n_kw
    (fun_in=fun_in@entry=0x7fffd861aac0, n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0)
    at ../../micropython/py/runtime.c:708
#8  0x000000000041b9db in mp_call_function_0 (fun=fun@entry=0x7fffd861aac0)
    at ../../micropython/py/runtime.c:682
#9  0x0000000000513c1f in parse_compile_execute
    (source=<optimized out>, input_kind=input_kind@entry=MP_PARSE_FILE_INPUT, exec_flags=136)
    at ../../micropython/shared/runtime/pyexec.c:123
#10 0x0000000000513db9 in pyexec_frozen_module
    (name=name@entry=0x55715b "_boot.py", allow_keyboard_interrupt=allow_keyboard_interrupt@entry=false) at ../../micropython/shared/runtime/pyexec.c:716
#11 0x00000000004cd171 in main_ (vargs=<optimized out>) at main.c:782
#12 0x00007ffff7b7cc02 in start_thread () at /lib64/libc.so.6
--Type <RET> for more, q to quit, c to continue without paging--
#13 0x00007ffff7c01c40 in clone3 () at /lib64/libc.so.6

Work around

Updated tulip/shared/py/_boot.py

import gc
import uos
import tulip, world, sys, midi, amy, alles

from upysh import *
from tulip import edit, run

Although not really related to this issue, I also updated my build.sh, as I am not sure why the script originally adds/modifies files in ~/Documents folder. Maybe the developer can explain.

Updated tulip/linux/build.sh

#!/bin/bash
# build.sh
# Just builds locally
set -e

source ../shared/grab_submodules.sh
make DEBUG=1
mkdir -p dev
cp build-standard/tulip/obj/tulip dev/

Rebuild and run

cd tulip/linux
./build.sh
./dev/tulip
bwhitman commented 3 months ago

Although not really related to this issue, I also updated my build.sh, as not I am not sure why the script originally adds/modifies files in ~/Documents folder. Maybe the developer can explain.

The ~/Documents stuff is for the "file system" in Tulip. The Tulip CC hardware has a file system where you can save your programs and we ship examples in /sys. On Desktop, we emulate this by putting user and sys in ~/Documents/tulipcc.

It sounds like on your system(s) the issues are coming from this expectation of a ~/Documents/tulipcc folder. That's the only bit of code that _boot.py runs on Tulip Desktop. We do need a place to store Tulip files, and ~/Documents is what works on macOS (as well as the Linux-es I tested on.) Does this folder exist on your system?

bwhitman commented 3 months ago

The error for#2 0x0000000000440639 in mp_vfs_chdir (path_in=<optimized out>) at ../../micropython/extmod/vfs.c:327 is letting me think it's having trouble chdir to your ~/Documents/tulipcc folder. Once you start tulip with your fixes, what does tulip.root_dir() show you? It should be /home/YOU/Documents/tulipcc or something similar that exists.

stewlab commented 3 months ago

The ~/Documents stuff is for the "file system" in Tulip. The Tulip CC hardware has a file system where you can save your programs and we ship examples in /sys. On Desktop, we emulate this by putting user and sys in ~/Documents/tulipcc.

It sounds like on your system(s) the issues are coming from this expectation of a ~/Documents/tulipcc folder. That's the only bit of code that _boot.py runs on Tulip Desktop. We do need a place to store Tulip files, and ~/Documents is what works on macOS (as well as the Linux-es I tested on.) Does this folder exist on your system?

Ok, false alarm - I likely caused the problem by editing the build.sh to prevent moving files to my ~/Documents folder :hand_over_mouth:

Maybe we can use something like ~/.local/share/tulipcc instead of ~/Documents/tulipcc? You can check the XDG_DATA_HOME environment variable defined in the spec here for more info: https://specifications.freedesktop.org/basedir-spec/basedir-spec-latest.html

Also, yes tulip.root_dir() references ~/Documents/tulipcc/ directory with my changes.

Excellent work on this project, BTW

bwhitman commented 3 months ago

I'm not much of a linux expert (clearly) - will XDG_DATA_HOME work on most distributions?

stewlab commented 3 months ago

Yeah pretty much all modern Linux desktop distributions support $XDG_DATA_HOME, but we could also just use $HOME/.local/share or ~/.local/share instead.

vijaymarupudi commented 2 months ago

I seem to facing a slightly different error: I am able to launch the tulip terminal. tulip.root_dir() correctly points to ~/.local/share/tulipcc directory.

I receive a stack smashing error when I execute a run('voices') or other run commands.

More oddly, I run into the same error when I type the following commands into the terminal: root_dir, version, remap, upgrade, exit, etc. But when I type run or edit, it correctly prints the function with it's address. Seems like it crashes when an error is occurring? It doesn't know how to perform error handling?

This makes me suspect that something is going wrong before any micropython function is run, there likely is a C error that occurred before that is causing these crashes.

When I run under gdb, here's where it seems to crash.

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff7b52463 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff7af9120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff7ae04c3 in __GI_abort () at abort.c:79
#4  0x00007ffff7ae1354 in __libc_message_impl (fmt=fmt@entry=0x7ffff7c6f16c "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:132
#5  0x00007ffff7be1799 in __GI___fortify_fail (msg=msg@entry=0x7ffff7c6f184 "stack smashing detected") at fortify_fail.c:24
#6  0x00007ffff7be2a54 in __stack_chk_fail () at stack_chk_fail.c:24
#7  0x0000555555613569 in mp_execute_bytecode (code_state=0x7fffe278e7b0, inject_exc=<optimized out>) at ../../micropython/py/vm.c:1487
#8  0x00005555555ff240 in fun_bc_call (self_in=<optimized out>, n_args=0, n_kw=0, args=0x2) at ../../micropython/py/objfun.c:330
#9  0x00005555555f5925 in mp_call_function_n_kw (fun_in=fun_in@entry=0x7fffd7624e60, n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0)
    at ../../micropython/py/runtime.c:708
#10 0x00005555555f596b in mp_call_function_0 (fun=fun@entry=0x7fffd7624e60) at ../../micropython/py/runtime.c:682
#11 0x00005555555f5a0e in mp_parse_compile_execute (lex=0x7fffd7624e20, parse_input_kind=parse_input_kind@entry=MP_PARSE_FILE_INPUT, 
    globals=globals@entry=0x7fffd761aac0, locals=locals@entry=0x555555922730 <mp_state_ctx+432>) at ../../micropython/py/runtime.c:1616
#12 0x000055555560e717 in eval_exec_helper (n_args=1, args=0x7fffd7624bb0, parse_input_kind=parse_input_kind@entry=MP_PARSE_FILE_INPUT)
    at ../../micropython/py/builtinevex.c:153
#13 0x000055555560e784 in mp_builtin_exec (n_args=<optimized out>, args=<optimized out>) at ../../micropython/py/builtinevex.c:162
#14 0x00005555555fef76 in fun_builtin_var_call (self_in=0x555555903130 <mp_builtin_exec_obj>, n_args=1, n_kw=0, args=0x7fffd7624bb0)
    at ../../micropython/py/objfun.c:119
#15 0x00005555555f5925 in mp_call_function_n_kw (fun_in=0x555555903130 <mp_builtin_exec_obj>, n_args=1, n_kw=0, args=args@entry=0x7fffd7624bb0)
    at ../../micropython/py/runtime.c:708
#16 0x00005555556120e7 in mp_execute_bytecode (code_state=code_state@entry=0x7fffd7624b80, inject_exc=inject_exc@entry=0x0) at ../../micropython/py/vm.c:957
#17 0x00005555555ff1a5 in fun_bc_call (self_in=0x7fffd76235c0, n_args=1, n_kw=0, args=0x7fffe278ec98) at ../../micropython/py/objfun.c:273
#18 0x00005555555f5925 in mp_call_function_n_kw (fun_in=0x7fffd76235c0, n_args=1, n_kw=0, args=args@entry=0x7fffe278ec98) at ../../micropython/py/runtime.c:708
#19 0x00005555556120e7 in mp_execute_bytecode (code_state=code_state@entry=0x7fffe278ec60, inject_exc=inject_exc@entry=0x0) at ../../micropython/py/vm.c:957
#20 0x00005555555ff1a5 in fun_bc_call (self_in=0x7fffd7624820, n_args=0, n_kw=0, args=0x0) at ../../micropython/py/objfun.c:273
#21 0x00005555555f5925 in mp_call_function_n_kw (fun_in=0x7fffd7624820, n_args=n_args@entry=0, n_kw=n_kw@entry=0, args=args@entry=0x0)
    at ../../micropython/py/runtime.c:708
#22 0x00005555555f596b in mp_call_function_0 (fun=<optimized out>) at ../../micropython/py/runtime.c:682
#23 0x00005555556b16bd in execute_from_lexer (source_kind=source_kind@entry=2, source=source@entry=0x7fffe278ee00, 
    input_kind=input_kind@entry=MP_PARSE_SINGLE_INPUT, is_repl=is_repl@entry=true) at main.c:188
#24 0x00005555556b191d in do_repl () at main.c:299
#25 0x00005555556b1be4 in main_ (vargs=<optimized out>) at main.c:794
#26 0x00007ffff7b5039d in start_thread (arg=<optimized out>) at pthread_create.c:447
#27 0x00007ffff7bd549c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The stack check always seem to fail in the mp_execute_bytecode function.

bwhitman commented 2 months ago

Very helpful! When you say you can run micropython just fine, is it the latest (main branch) version? Or the one we pin in our repository ?

vijaymarupudi commented 2 months ago

Aha! Seems like your intuition was right.

The latest version of micropython:

MicroPython v1.24.0-preview.206.ge9814e987 on 2024-08-17; linux [GCC 14.2.1] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> run
run
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'run' isn't defined
>>> 

The version pinned in the tulipcc repo:

MicroPython v1.20.0-244-g2ede7c6c2 on 2024-08-17; linux [GCC 14.2.1] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> run
run
*** stack smashing detected ***: terminated
Aborted (core dumped)
bwhitman commented 2 months ago

Ooh, exciting. run won’t work on bare micropython as that’s a Tulip. But it crashing on our version means we can probably fix this by finishing #315 . I’ll take a closer look asap

vijaymarupudi commented 2 months ago

Right! I suspect there's something wrong with micropython's error management, perhaps an assembly programming bug that messes with the stack. No rush, but thanks for looking into it!

znmeb commented 2 months ago

I don't know if it's the same issue but when I was running Tulip Desktop on Ubuntu in a container, the REPL would come up and execute Python OK, but when I exited Python or closed the window with the mouse, it segfaulted.