Closed alexzhangs closed 3 years ago
Thank you for investigating.
The CI environment is different from the actual machine, so it may behave strangely in rare cases. I've had a few such experiences. As you know, it is very difficult to resolve it. And travis is no longer supporting the OSS project, so I can't test. I would be applicate if you could help me.
is the debugger like gdb is a choice?
Unfortunately, I don't know how to test using gdb.
shellspec_exists_envkey() {
(
key="$1"
callback() { [ ! "$1" = "$key" ] &&:; }
shellspec_list_envkeys callback && return 1
return 0
) &&:
}
This code is very strange and may be resolved by changing it to a more natural code.
I am wondering if subshells are really the cause of the problem. What happens if we don't use return
?
# the problem goes away without subshell
shellspec_exists_envkey () { return 0; }
# the problem remains with subshell
shellspec_exists_envkey () { (return 0); }
# ?
shellspec_exists_envkey () { ( : ); }
# ?
shellspec_exists_envkey () { ( exit 0 ); }
If all the code using subshells fails, then implement it without subshells.
shellspec_exists_envkey() {
shellspec_exists_envkey_key=$1
shellspec_list_envkeys shellspec_exists_envkey_ && return 1
return 0
}
shellspec_exists_envkey_() { [ ! "$1" = "$shellspec_exists_envkey_key" ]; }
I believe this code will work. However, I have not tested it with other shells, so additional modifications may be needed.
And travis is no longer supporting the OSS project, so I can't test. I would be applicate if you could help me.
I'm able to get free credit for OSS projects by sending a request email to Travis suppot team. https://docs.travis-ci.com/user/billing-faq#what-if-i-am-building-open-source
And yes, I'm considering to request a SSH permission for the testing project to debug this issue.
I am wondering if subshells are really the cause of the problem. What happens if we don't use return?
I'll try your sample testing code. But I think the segementation fault is happening right on the subshell is being forked. because there's nothing printed in the trace after that. And I found another segmentation fault test case with a single status
statement. I didn't look into it yet, but I knew that subshell is used in the code which were being tested.
I'll let you know if have made any progress.
@ko1nksm I have got a local debug environment to reproduce this issue by running the Travis docker image locally.
Here are the steps:
Assuming you already have Docker app
installed at your local (Mine is Docker Desktop 3.2.1
on macOS 11.2.1
).
Start a 'Travis xenial Ubuntu Linux` docker container at local:
BUILDID="build-$RANDOM"
INSTANCE="travisci/ci-stevonnie:packer-1564744294-e0797511"
docker run --name $BUILDID -dit $INSTANCE /sbin/init
docker exec -it $BUILDID bash -l
Reproduce the issue inside the docker.
su - travis
sudo apt-get update -y
sudo -E apt-get -yq --no-install-suggests --no-install-recommends install binutils-dev libcurl4-openssl-dev libdw-dev libiberty-dev
export PATH=${HOME}/.local/bin:${HOME}/kcov/bin:${PATH}
curl -fsSL https://git.io/shellspec | sh -s -- -y
wget https://github.com/SimonKagstrom/kcov/archive/master.tar.gz tar xzf master.tar.gz (cd kcov-master && mkdir -p build && cd build && cmake -DCMAKE_INSTALL_PREFIX=${HOME}/kcov ..; make && make install)
shellspec --version kcov --version
ulimit -c unlimited
git clone https://github.com/alexzhangs/xsh.git alexzhangs/xsh cd alexzhangs/xsh
shellspec --kcov -s /bin/bash spec/foo_spec.sh
3. debug the dumped core file with gdb inside the docker
sudo apt-get install gdb
U=http://ddebs.ubuntu.com D=$(lsb_release -cs) cat <<EOF | sudo tee /etc/apt/sources.list.d/ddebs.list deb ${U} ${D} main restricted universe multiverse
deb ${U} ${D}-updates main restricted universe multiverse deb ${U} ${D}-proposed main restricted universe multiverse EOF wget -O - http://ddebs.ubuntu.com/dbgsym-release-key.asc | sudo apt-key add - sudo apt-get update -y sudo apt-get install -y bash-dbgsym
gdb bash ./core
The gdb output:
travis@75e894a0623b:~/alexzhangs/xsh$ gdb bash core GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1 Copyright (C) 2016 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from bash...Reading symbols from /usr/lib/debug//bin/bash...done. done. [New LWP 7784] Core was generated by `/bin/bash /tmp/shellspec.1620732472.7667/kcov/xsh [specfiles]'. Program terminated with signal SIGSEGV, Segmentation fault.
1090 .././jobs.c: No such file or directory.
Please ignore the message `.././jobs.c: No such file or directory.`, I didn't install the bash source at my local.
The `bt` command output of `gdb` can be found at: https://justpaste.it/5bfo7
The `bt full` command output of `gdb` can be found at https://justpaste.it/4bijv
I'm not familier with gdb, the further investagition and help are needed.
@ko1nksm Plus my last comment, I also tested the 3 piece of code you suggested, they all go to the Segmentation Fault
sooner or later once meet a subshell.
?
shellspec_exists_envkey () { ( : ); }
?
shellspec_exists_envkey () { ( exit 0 ); }
shellspec_exists_envkey() { shellspec_exists_envkey_key=$1 shellspec_list_envkeys shellspec_existsenvkey && return 1 return 0 } shellspec_existsenvkey() { [ ! "$1" = "$shellspec_exists_envkey_key" ]; }
@alexzhangs Thanks for writing how to run the travis Docker image locally. This helped me a lot!
Upon investigation, I have confirmed that this is a bug in bash and not a problem with travis, kcov, or shellspec. This bug seems to have been introduced in bash 4.3.2 and fixed in 4.4.0.
Here are the steps to reproduce it.
$ docker run -it fidian/multishell
root@469b51e73506:/# cat <<'HERE' > issue.sh
#!/bin/bash
xsh () {
xsh_clean() { : $(echo); :; }
trap "trap - RETURN; xsh_clean" RETURN
}
(
# Required to enable coverage (kcov).
set -o functrace
trap ':' DEBUG
xsh
( : ) # Segmentation fault
echo end
)
HERE
root@469b51e73506:/# bash-4.3.2 issue.sh
issue.sh: line 16: 15 Segmentation fault ( set -o functrace; trap ':' DEBUG; xsh; ( : ); echo end )
Currently, there is no way to work around this bug on the ShellSpec side. I suspect it will be difficult.
BTW, I figured out this bug by reducing the ShellSpec code from a reproducible environment.
@ko1nksm Thanks for the amazing investigation! I was lost in the gdb debugging maze. ;)
I have moved to Travis dist bionic
which has bash 4.4.20
packed inside. And all test cases go well.
Thanks again for the help! I'm closing this issue.
This issue is found only in the environment described below. It works totally well on macOS and AWS Linux 2 AMI (with the same
shellspec
andkcov
version and test case).Environment:
spec/foo_spec.sh:
The Segmentation fault is caused by either of the last 2 statements:
variable
andresult of function
. My debug is focused on thevariable
statement. Make some slight change onxsh.sh
, such as removetrap
or__xsh_clean
, the problem will go away.Command:
Error log:
Since Travis is the only environment the problem is found, the debug is very painful. I inject debug code into
shellspec
at the deploy time by.travis.yml
.Step by step, finally found the code where the segmentation fault is triggered. Seems the error is triggered by the subshell within the function
shellspec_exists_envkey
.I simplified the function code, and I can see the difference, but really don't understand why, don't have a clue what to do next, is the debugger like
gdb
is a choice?The following is for reference.
The full .travis.yml:
The system limits on Travis:
The full list of environment variables on Travis: