Closed nagua closed 4 years ago
@nagua Thanks for reporting this. It looks very similar to this [one].(https://github.com/Microsoft/BashOnWindows/issues/1466)
Since you provide good repro steps, I am opening a bug to track this internally.
I have done some changes to how we handle SIGSEGV to match more closely Linux behavior and with these changes I can see this still reproing, but the SIGSEGV fails with SEGV_MAPERR, which signals accessing unmapped memory. Most probably a dangling pointer issue, or some sort of buffer overflow. I don't see this repro on Ubuntu, but without source code this is going to be hard to debug.
I think this is an unchanged gcc cross compiler. You can see that it is build with crosstool-ng by evaluating the version string. It is simply a crosscompiler from amd64 to i368 but distributed from aldebaran(soft-bank robotics). So I think you can simply look in the gcc source-code from version 4.5.3 to track the issue down.
This is the version string of the compiler from Softbank, which has the segfault:
i686-aldebaran-linux-gnu-g++ (crosstool-NG hg+unknown-20130411.130503) 4.5.3
I build this compiler using crosstool-ng, trying to match the Softbank one as much as possible, which does not segfault:
i686-nptl-linux-gnu-g++ (crosstool-NG crosstool-ng-1.22.0) 4.5.4
Unfortunately, at this point there is not much more we can do to debug this issue, since it does not repro on the GCC cross-compiler built from source.
Thank you anyway. I will try and contact Softbanks about this and see what they can provide and do about this.
But thank you very much for your help so far.
No problem, please let us know if you have any more information to make further progress on this investigation.
I tried to rebuild the toolchain by myself and I got an toolchain where the compiler is crashing like the original one. I used an Ubuntu 12.04 image and the mercurial crosstool-ng revision 3200. On top of that the crosstool-ng toolchain has a program to get the used config (./cross/bin/i686-aldebaran-linux-gnu-ct-ng.config). With these things in place I could rebuild the toolchain. I disabled the stripping of the executables. I have uploaded the packed files to OneDrive (https://1drv.ms/u/s!AmRdBJnPKuRk1Uq8VWzTkd1eFDw2). Do you need more than unstripped binaries?
The error is slightly different, but it could be due to the fact that I'm now using the stable windows build version.
execve("./i686-aldebaran-linux-gnu-g++", ["./i686-aldebaran-linux-gnu-g++", "/home/nicolas/main.c"], [/* 16 vars */]) = 0
uname({sys="Linux", node="NICI-PC", ...}) = 0
brk(0) = 0x1849000
brk(0x184a140) = 0x184a140
arch_prctl(ARCH_SET_FS, 0x1849800) = 0
brk(0x186b140) = 0x186b140
brk(0x186c000) = 0x186c000
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV (core dumped) +++
Edit Sorry I tried my own toolchain on the newest insider build and there no crash is happening... So I have to investigate further...
Greetings Nicolas
Ah, thanks for trying this out. Let us know if you see any crashes in the future.
Nicolas' last strace
is a gift. What struck me about this issue is how few syscalls can be at fault here. His first strace
from Softbank's binary makes only 47 syscalls before faceplanting. His last strace
makes only 7, and it sure as heck isn't execve()
or uname()
.
[edit:] _blah blah incorrect speculation about arch_prctl()
and brk()
since I thought that was the only surface that could be causing trouble_.
Okay, I think it was right in front of us:
SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0xffffffffff600000}
That isn't a "dangling pointer issue, or some sort of buffer overflow". That's pointing squarely and deliberately at the vsyscall page, which the Softbank binary and the #1466 CCP4 binary are statically linked to reference. Native Ubuntu's generic kernel is built with CONFIG_LEGACY_VSYSCALL
, but on WSL, address ffffffffff600000 - ffffffffff601000
isn't mapped.
Easy to see with cat proc/self/maps
:
WSL:
...
7ffffb845000-7ffffc045000 rw-- 00000000 00:00 0 [stack]
7ffffc769000-7ffffc76a000 r-x- 00000000 00:00 0 [vdso]
Native:
...
7ffdc1b57000-7ffdc1b78000 rw-p 00000000 00:00 0 [stack]
7ffdc1b9e000-7ffdc1ba0000 r--p 00000000 00:00 0 [vvar]
7ffdc1ba0000-7ffdc1ba2000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Possibly the shortest test case ever:
int main() {
return *(int*)(0xffffffffff600000);
}
@therealkenc Nice investigation, thanks!! Can't believe we never looked at the actual segfault address... We'll see if we can implement the vsyscall page, since we already have vdso.
For what it is worth, vsyscall seems to be shunned these days (I think as of Real Linux circa 3.2) because of security and whatnot. Newthink seems to be to take the trap and emulate. All the cool kids seem to be hardening their systems so it might not be worth implementing something that was obsolete before you even started. You also avoid reddit/theregister posts claiming that WSL is insecure.
Yep that's right, we'll take that into account when discussing the fix, thanks as always!
Hey Guys, are there any news on this issue?
They're probably stuck in pencils down for Creators Update. The work-around for this (and #1466) is to compile from source or ask the vendor to make a non-statically linked version.
Will this problem with vsyscall emulation will be addressed in WSL soon? Currently it stops the Linux tools we use at my work from working on WSL. Thanks to @therealkenc for investigating and diagnosing it.
Will this problem with vsyscall emulation will be addressed in WSL soon?
We don't get ETAs on open issues. This is sad but understandable.
Can this be implemented in WSL
Yes because Turing completeness.
or has it been decided not to do this?
No because this issue was not closed and tagged by-design.
Thant sounds good. Many people will look forward to once this has been implemented. I realise that having to implement a feature to handle what I understand is the behaviour of dirty legacy programs from the past is probably not the highest priority on the list. But given that MS has managed to retain backwards compatibility on the win32 platform with their PE format for well over two decades declining to implement vsyscall emulation on WSL would seem rather odd.
Please use the following bug reporting template to help produce actionable and reproducible issues. Please try to ensure that the reproduction is minimal so that the team can go through more bugs!
A brief description I wanted to use wsl to cross compile code to the Softbanks Nao Robot. When using the gcc based cross compiler the compiler get's an segmentation fault.
Expected results A compiled binary.
Actual results (with terminal output if applicable)
And no output at all.
Your Windows build number Tested with 14971 and the current stable windows 10 version.
Steps / All commands required to reproduce the error from a brand new installation
Cross Toolchain 2.1.4 Linux 64
under the point4 - C++ NAOqi SDK
and you need to create an account for that.echo "int main() {return 0;}" > main.c
./ctc-linux64-atom-2.1.4.13/cross/bin/i686-aldebaran-linux-gnu-c++ main.c
No additional required packages other than the toolchain mentioned above. The toolchain is completely statically linked.
See our contributing instructions for assistance.