Closed mfikes closed 1 year ago
To get more info,
export ASAN_OPTIONS=detect_leaks=0
and then revise planck-c/CMakeLists.txt
to set(CMAKE_BUILD_TYPE Debug)
and uncomment the CMAKE_C_FLAGS
setting line with -fsanitize=address
With this:
script/test
Running unit tests...
AddressSanitizer:DEADLYSIGNAL
=================================================================
==485996==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000005 (pc 0x7fb83c7913a0 bp 0x62d000043b00 sp 0x7ffe9add4ea0 T0)
==485996==The signal is caused by a READ memory access.
==485996==Hint: address points to the zero page.
#0 0x7fb83c79139f in JSC::JSFunction::getOwnPropertySlot(JSC::JSObject*, JSC::JSGlobalObject*, JSC::PropertyName, JSC::PropertySlot&) (/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18+0x112339f)
#1 0x7fb83c47e379 (/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18+0xe10379)
#2 0x7fb7f4a63a5c (<unknown module>)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libjavascriptcoregtk-4.0.so.18+0x112339f) in JSC::JSFunction::getOwnPropertySlot(JSC::JSObject*, JSC::JSGlobalObject*, JSC::PropertyName, JSC::PropertySlot&)
==485996==ABORTING
Ok, I know you have vagrant stuff setup, but I'm guessing it might be a little out of date? Maybe? Maybe not, what do I know? But I think docker might be a more CI-friendly way to go (this might be reusable work).
So to reproduce locally, I whipped up the following Dockerfile
:
FROM ubuntu:20.04
ARG CLOJURE_CLI_VERSION=1.11.1.1105
# Set timezone to avoid interactive prompts when installing packages:
RUN ln -snf /usr/share/zoneinfo/$CONTAINER_TIMEZONE /etc/localtime && echo $CONTAINER_TIMEZONE > /etc/timezone
RUN apt-get -y update
RUN apt-get install -y \
cmake xxd git curl libjavascriptcoregtk-4.0 libglib2.0-dev libzip-dev libcurl4-gnutls-dev libicu-dev unzip
RUN apt install -y build-essential apt-utils openjdk-8-jdk
RUN curl -OL "https://download.clojure.org/install/linux-install-${CLOJURE_CLI_VERSION}.sh" \
&& chmod +x linux-install-${CLOJURE_CLI_VERSION}.sh \
&& /linux-install-${CLOJURE_CLI_VERSION}.sh \
&& rm /linux-install-${CLOJURE_CLI_VERSION}.sh
And then from the same dir, built the docker image via:
docker build -t lread:planck-test .
And ran the docker image via:
docker run -it lread:planck-test
And then ran tests for planck master:
$ git clone https://github.com/planck-repl/planck.git
$ cd planck
$ script/build -Werror --fast
$ ./planck-c/build/planck --version
2.26.0
$ script/test
And I see a seg fault:
script/test: line 9: 24095 Segmentation fault script/test-unit
Kill docker session and restart session:
$ git clone https://github.com/planck-repl/planck.git
$ cd planck
$ git reset 2.25.0 --hard
HEAD is now at 4b61b2a 2.25.0
$ script/build -Werror --fast
$ ./planck-c/build/planck --version
2.25.0
$ script/test
And again we see a seg fault:
script/test: line 9: 23525 Segmentation fault script/test-unit
I'll continue to poke around. Feel free to point me in a direction, if you think of one.
@lread One thing I will often do when debugging native code issues is change the release type to Debug and / or enable address sanitizer (see here).
Thanks @mfikes!
I have a vague memory of using Valgrind for this type of issue eons ago. I'll try that too.
@lread Also, it is possible to attach gdb
to the Planck process, and, while a little tricky since this involves tests provoked by a unit test, if things crash while gdb
is attached then of course things are easy to see.
Thanks @mfikes, I have foggy memories of gdb
from a previous life. 🙂
More data points (this is probably known to you, but I wanted to witness it):
Both 2.25.0
and master
succeed for me on Ubuntu 18.04.6
On a whim, I decided to try running jsc
, webkit's cli JavaScript interpreter on Ubuntu 20.04.
It wasn't a rousing success.
$ jsc
Segmentation fault
Oh. Same deal for Ubuntu 18.04.
Oh. Thought I was onto something here with these planck build warnings from Ubuntu 22:
In function 'maybe_load_user_file',
inlined from 'maybe_load_user_file' at /planck/planck-c/engine.c:414:6:
/planck/planck-c/engine.c:418:9: error: 'JSObjectCallAsFunction' reading 8 bytes from a region of size 0 [-Werror=stringop-overread]
418 | JSObjectCallAsFunction(ctx, get_function("planck.repl", "maybe-load-user-file"), JSContextGetGlobalObject(ctx),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
419 | 0, arguments, &ex);
| ~~~~~~~~~~~~~~~~~~
/planck/planck-c/engine.c: In function 'maybe_load_user_file':
/planck/planck-c/engine.c:418:9: note: referencing argument 5 of type 'const struct OpaqueJSValue * const*'
And this note that arguments
should be NULL when argumentCount is 0:
arguments A JSValueRef array of arguments to pass to the function. Pass NULL if
argumentCount
is 0.
And things do pass after correcting this on Ubuntu 22, but still seg fault on Ubuntu 20.
I'll continue to poke around.
Hiya @mfikes! After some distractions, I was about to roll up my sleeves and take another go at this issue.
But I see that planck tests are now passing off master (even without addressing the potential issue above) against the current Ubuntu 20.04 and libjavascriptcoregtk
library.
My original testing was against libjavascriptcoregtk
2.34.6
.
Tests are passing against 2.36.0
.
So... I'm wondering what you recommend.
Should take a stab testing against various versions of libjavascriptcoregtk
on Ubuntu?
Hmm... As far as I can tell it is not terribly easy to install arbitrary previous package versions. So, as I understand it (could be wrong, dunno), if we wanted to test against various versions of libjavascriptcoregtk
we'd be building those from source.
For example, if I do an apt-cache showpkg libjavascriptcoregtk-4.0
, I only see the current installed version 2.36.0
and 2.28.1
. Same results for apt list -a libjavascriptcoregtk-4.0-18
.
Since master passes against current libjavascriptcoregtk
and current Unbuntu 20.04, I'd be tempted to release the binaries. With perhaps a tip to apt-get update
then apt-get upgrade
to current. Whaddya think?
While digging around I did notice the following and could help to address as separate issues if any of them make sense to you too:
/dev/null
. This made it difficult for me to understand the details of what was happening during the build. What do you think of going verbose for the build?exception
parameter. The API supports passing NULL
to discard the exception. Planck often passes in NULL
here. I often wondered if these exceptions might tell us something valuable. (Maybe you found otherwise? Maybe they are mostly useless noise?) I was thinking maybe a thin wrapper over the JavaScriptCore APIs we use might be useful. This thin wrapper could optionally log all exceptions.lldb
gave more details than gdb
when listing backtraces. I think it might be that lldb
better supports cpp? Would a note in the dev guide docs be helpful here?@mfikes, when you find some time and interest, lemme know if you'd like me to raise issues for all or some of https://github.com/planck-repl/planck/issues/1087#issuecomment-1113477844.
@mfikes Ubuntu 22.04 is now GA on Github Actions.
Past experiments showed binaries building and passing on 22.04. So maybe we can skip 20.04 and build binaries for 22.04? Would that make sense?
@mfikes, I suppose this one could be closed with a status of "wontfix"?
If you update to Ubuntu 20.04 and run the unit tests, you get a failure
This happens with both fast and regular builds.
These crashes don't occur on Ubuntu 18.04.