google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.86k stars 1.3k forks source link

MSSQL container fails due to missing syscal argument #11112

Open liron-l opened 2 weeks ago

liron-l commented 2 weeks ago

Description

docker run --runtime=runsc -e "MSSQL_ENABLE_HADR=0" -e "MSSQL_PID=Express" -v /tmp/socket:/tmp/socket  -v 
/home/liron/Downloads/userdata/data:/tmp/data  -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=YourStrongPassw0rd"  --name sqlserver -d mcr.microsoft.com/mssql/server

Crashes with the following logs:

The following diagnostic information is available:

         Reason: 0x00000007
         Status: 0x00000000
        Message: BOOT: FATAL: Failed to initialize initial thread -1073741823
        Process: 4 - sqlservr
         Thread: 9
    Instance Id: 14b18a39-6c46-4906-bb25-01f752b1d4a2
       Crash Id: 5f8b2bd8-1882-42b1-8011-7eab6f0f22f2
    Build stamp: 37246ef8816ea823c6820306d0b9c82d559924f0b19b80c72855e58b0e4b145f
   Distribution: Ubuntu 22.04.5 LTS
     Processors: 16
   Total Memory: 46099476480 bytes
      Timestamp: Mon Nov  4 14:44:41 2024
     Last errno: 22
Last errno text: Invalid argument
Capturing a dump of 4
sqlservr: utils.cpp:561: uintptr_t GetEndOfLibOSVmRange(): Assertion `GetGlobals()->GetStatics()->IsLibOSVmmRangeEndValid()' failed.

Debug logs:

I1104 16:38:17.962665  1460517 strace.go:602] [   4:   9] sqlservr X arch_prctl(0x1001, 0x3fffba240000) = 0 (0x0) errno=22 (invalid argument) (31.168µs)

And this:

syscalls.PartiallySupported("arch_prctl", ArchPrctl, "Options ARCH_GET_GS, ARCH_SET_GS not supported.", nil),

Why are those syscall arguments not implemented?

Steps to reproduce

docker run --runtime=runsc -e "MSSQL_ENABLE_HADR=0" -e "MSSQL_PID=Express" -v /tmp/socket:/tmp/socket  -v 
/home/liron/Downloads/userdata/data:/tmp/data  -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=YourStrongPassw0rd"  --name sqlserver -d mcr.microsoft.com/mssql/server

runsc version

runsc version release-20241028.0 spec: 1.1.0-rc.1

docker version (if using docker)

\Client: Docker Engine - Community
 Version:    24.0.5
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.11.2
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.20.2
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 20
  Running: 0
  Paused: 0
  Stopped: 20
 Images: 18
 Server Version: 24.0.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc runsc runsc-unix-debug
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-47-generic
 Operating System: Ubuntu 22.04.2 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 42.93GiB
 Name: liron-ThinkPad
 ID: 5K57:D7LR:BPAE:FCFG:CCKI:UVE3:M7NP:L2I5:ZEYJ:GVYS:P7FR:AGM5
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: eksclustergames
 Experimental: false
 Insecure Registries:
  127.0.0.0/8

uname

Linux liron-ThinkPad 6.8.0-47-generic #47~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 2 16:16:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

1104 16:38:17.962558  1460517 usertrap_amd64.go:210] [   4:   9] Found the pattern at ip 7ebf6c7269e4:sysno 158
D1104 16:38:17.962585  1460517 usertrap_amd64.go:122] [   4:   9] Allocate a new trap: 0xc000811350 69
D1104 16:38:17.962601  1460517 usertrap_amd64.go:223] [   4:   9] Apply the binary patch addr 7ebf6c7269e4 trap addr 63590 ([184 158 0 0 0 15 5] -> [255 36 37 144 53 6 0])
I1104 16:38:17.962618  1460517 strace.go:564] [   4:   9] sqlservr E arch_prctl(0x1001, 0x3fffba240000)
I1104 16:38:17.962641  1460517 compat.go:120] Unsupported syscall arch_prctl(0x1001,0x3fffba240000,0x0,0x3fffba23aec8,0x8,0x0). It is likely that you can safely ignore this message and that this is not the cause of any error. Please, refer to https://gvisor.dev/c/linux/amd64/arch_prctl for more information.
I1104 16:38:17.962665  1460517 strace.go:602] [   4:   9] sqlservr X arch_prctl(0x1001, 0x3fffba240000) = 0 (0x0) errno=22 (invalid argument) (31.168µs)
D1104 16:38:17.962696  1460517 usertrap_amd64.go:210] [   4:   9] Found the pattern at ip 7ebf6c714fe4:sysno 32
D1104 16:38:17.962710  1460517 usertrap_amd64.go:122] [   4:   9] Allocate a new trap: 0xc000811350 70
D1104 16:38:17.962723  1460517 usertrap_amd64.go:223] [   4:   9] Apply the binary patch addr 7ebf6c714fe4 trap addr 635e0 ([184 32 0 0 0 15 5] -> [255 36 37 224 53 6 0])
I1104 16:38:17.962741  1460517 strace.go:561] [   4:   9] sqlservr E dup(0x2 host:[3])
I1104 16:38:17.962757  1460517 strace.go:599] [   4:   9] sqlservr X dup(0x2 host:[3]) = 20 (0x14) (2.805µs)
I1104 16:38:17.962787  1460517 strace.go:567] [   4:   9] sqlservr E fcntl(0x14 host:[3], 0x3, 0x0)
I1104 16:38:17.962804  1460517 strace.go:605] [   4:   9] sqlservr X fcntl(0x14 host:[3], 0x3, 0x0) = 1025 (0x401) (1.543µs)
I1104 16:38:17.962824  1460517 strace.go:561] [   4:   9] sqlservr E close(0x14 host:[
avagin commented 2 weeks ago

Why are those syscall arguments not implemented?

liron-l commented 2 weeks ago

Thanks for the update, @avagin. Considering the current architecture, do you think it’s feasible to resolve?

avagin commented 2 weeks ago

@liron-l yes, it is feasible.

liron-l commented 1 week ago

@avagin maybe you can share some guidelines on how to fix that (even only for ptrace)? 🙏

avagin commented 1 week ago

@avagin maybe you can share some guidelines on how to fix that (even only for ptrace)? 🙏

It should be something like that: https://github.com/avagin/gvisor/commit/0a2587e11a07f4a48b1cba1bd722f8e27e1e9289

liron-l commented 1 week ago

Thanks @avagin 👑 , i also tried a similar solution, but SQL server still crashed

docker run --runtime=runsc1   -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=YourStrongPassw0rd"  --name sqlserver -d mcr.microsoft.com/mssql/server
2024-11-12 13:07:22.37 Server      SQL Server is terminating because of fatal exception c0000005. This error may be caused by an unhandled Win32 or C++ exception, or by an access violation encountered during exception handling. Check the SQL error log for any related stack dumps or messages. This exception forces SQL Server to shutdown. To recover from this error, restart the server (unless SQLAgent is configured to auto restart).
2024-11-12 13:07:22.39 Server      Using 'dbghelp.dll' version '4.0.5'
2024-11-12 13:07:22.40 Server      **Dump thread - spid = 0, EC = 0x0000000000000000 Connection = 0x0000000000000000

2024-11-12 13:07:22.40 Server      * *******************************************************************************

2024-11-12 13:07:22.41 Server      *

2024-11-12 13:07:22.41 Server      * BEGIN STACK DUMP:

2024-11-12 13:07:22.41 Server      *   11/12/24 13:07:22 spid 532

2024-11-12 13:07:22.41 Server      *

2024-11-12 13:07:22.41 Server      * ex_handle_except encountered exception C0000005 - Server terminating

2024-11-12 13:07:22.42 Server      *

2024-11-12 13:07:22.42 Server      *
* MODULE LISTING:
*

2024-11-12 13:07:22.42 Server      *  MODULE                          BASE      END       SIZE

2024-11-12 13:07:22.42 Server      * sqlservr                       0000000B00210000  0000000B002F2FFF  000e3000

2024-11-12 13:07:22.43 Server      * ntdll                          0000000B00040000  0000000B0020BFFF  001cc000

2024-11-12 13:07:22.43 Server      * KERNEL32                       0000000B015D0000  0000000B01682FFF  000b3000

2024-11-12 13:07:22.44 Server      * KERNELBASE                     0000000B01A60000  0000000B01CFCFFF  0029d000

2024-11-12 13:07:22.44 Server      * NETAPI32                       0000000B01700000  0000000B01718FFF  00019000

2024-11-12 13:07:22.44 Server      * pdh                            0000000B01DD0000  0000000B01E1CFFF  0004d000

2024-11-12 13:07:22.45 Server      * CRYPT32                        0000000B01E20000  0000000B0201DFFF  001fe000

2024-11-12 13:07:22.45 Server      * WS2_32                         0000000B02020000  0000000B0208CFFF  0006d000

2024-11-12 13:07:22.45 Server      * IPHLPAPI                       0000000B00020000  0000000B00029FFF  0000a000

2024-11-12 13:07:22.46 Server      * ADVAPI32                       0000000B01720000  0000000B01733FFF  00014000

2024-11-12 13:07:22.46 Server      * ole32                          0000000B02090000  0000000B021E5FFF  00156000

2024-11-12 13:07:22.47 Server      * SQLOS                          0000000B01740000  0000000B01749FFF  0000a000

2024-11-12 13:07:22.47 Server      * sqlmin                         0000000B021F0000  0000000B0532DFFF  0313e000

2024-11-12 13:07:22.47 Server      * sqllang                        0000000B05330000  0000000B081D7FFF  02ea8000

2024-11-12 13:07:22.48 Server      * sqlTsEs                        0000000B081E0000  0000000B08C23FFF  00a44000

2024-11-12 13:07:22.48 Server      * sqldk                          0000000B08C30000  0000000B09234FFF  00605000

2024-11-12 13:07:22.48 Server      * qds                            0000000B09240000  0000000B09406FFF  001c7000

2024-11-12 13:07:22.49 Server      * svl                            0000000B09410000  0000000B09450FFF  00041000

2024-11-12 13:07:22.49 Server      * MSVCP140                       0000000B09460000  0000000B094EDFFF  0008e000

2024-11-12 13:07:22.50 Server      * VCRUNTIME140                   0000000B094F0000  0000000B0950AFFF  0001b000

2024-11-12 13:07:22.50 Server      * ucrtbase                       0000000B09510000  0000000B09609FFF  000fa000

2024-11-12 13:07:22.50 Server      * msvcrt                         0000000B09610000  0000000B096ADFFF  0009e000

2024-11-12 13:07:22.51 Server      * RPCRT4                         0000000B01750000  0000000B0175EFFF  0000f000

2024-11-12 13:07:22.51 Server      * sechost                        0000000B096B0000  0000000B09751FFF  000a2000

2024-11-12 13:07:22.51 Server      * MSASN1                         0000000B09760000  0000000B09771FFF  00012000

2024-11-12 13:07:22.52 Server      * DkDll                          0000000B09780000  0000000B09785FFF  00006000

2024-11-12 13:07:22.52 Server      * combase                        0000000B09790000  0000000B09ABAFFF  0032b000

2024-11-12 13:07:22.52 Server      * GDI32                          0000000B09AC0000  0000000B09AE8FFF  00029000

2024-11-12 13:07:22.53 Server      * USER32                         0000000B09AF0000  0000000B09C86FFF  00197000

2024-11-12 13:07:22.53 Server      * Secur32                        0000000B09C90000  0000000B09C97FFF  00008000

2024-11-12 13:07:22.53 Server      * USERENV                        0000000B09CA0000  0000000B09CC8FFF  00029000

2024-11-12 13:07:22.54 Server      * WINHTTP                        0000000B09CD0000  0000000B09DCCFFF  000fd000

2024-11-12 13:07:22.54 Server      * XOLEHLP                        0000000B09DD0000  0000000B09DEEFFF  0001f000

2024-11-12 13:07:22.55 Server      * WININET                        0000000B09DF0000  0000000B0A2D1FFF  004e2000

2024-11-12 13:07:22.55 Server      * MPR                            0000000B0A2E0000  0000000B0A2E7FFF  00008000

2024-11-12 13:07:22.55 Server      * OLEAUT32                       0000000B0A2F0000  0000000B0A3B3FFF  000c4000

2024-11-12 13:07:22.56 Server      * ODBC32                         0000000B0A3C0000  0000000B0A47BFFF  000bc000

2024-11-12 13:07:22.56 Server      * secforwarder                   0000000B0A480000  0000000B0A48FFFF  00010000

2024-11-12 13:07:22.56 Server      * MSWSOCK                        0000000B0A490000  0000000B0A4F6FFF  00067000

2024-11-12 13:07:22.57 Server      * SHLWAPI                        0000000B0A500000  0000000B0A551FFF  00052000

2024-11-12 13:07:22.57 Server      * AUTHZ                          0000000B0A560000  0000000B0A5ACFFF  0004d000

2024-11-12 13:07:22.57 Server      * VERSION                        0000000B0A5B0000  0000000B0A5B9FFF  0000a000

2024-11-12 13:07:22.58 Server      * WINTRUST                       0000000B0A5C0000  0000000B0A61FFFF  00060000

2024-11-12 13:07:22.58 Server      * webservices                    0000000B0A620000  0000000B0A771FFF  00152000

2024-11-12 13:07:22.58 Server      * urlmon                         0000000B0A780000  0000000B0A956FFF  001d7000

2024-11-12 13:07:22.59 Server      * SHELL32                        0000000B0A960000  0000000B0BE57FFF  014f8000

2024-11-12 13:07:22.59 Server      * dhcpcsvc                       0000000B0BE60000  0000000B0BE7BFFF  0001c000

2024-11-12 13:07:22.59 Server      * VCRUNTIME140_1                 0000000B0BE80000  0000000B0BE8BFFF  0000c000

2024-11-12 13:07:22.60 Server      * WINMM                          0000000B0BE90000  0000000B0BEB3FFF  00024000

2024-11-12 13:07:22.60 Server      * bcrypt                         0000000B0BEC0000  0000000B0BEE5FFF  00026000

2024-11-12 13:07:22.61 Server      * sqlpal                         00003FFF84C00000  00003FFF85BFFFFF  01000000

2024-11-12 13:07:22.61 Server      * bcryptPrimitives               0000000B0BEF0000  0000000B0BF71FFF  00082000

2024-11-12 13:07:22.61 Server      * gdi32full                      0000000B0BF80000  0000000B0C129FFF  001aa000

2024-11-12 13:07:22.62 Server      * win32u                         0000000B0C130000  0000000B0C152FFF  00023000

2024-11-12 13:07:22.62 Server      * profapi                        0000000B0C160000  0000000B0C182FFF  00023000

2024-11-12 13:07:22.62 Server      * msvcp_win                      0000000B0C190000  0000000B0C22FFFF  000a0000

2024-11-12 13:07:22.63 Server      * iertutil                       0000000B0C230000  0000000B0C4DBFFF  002ac000

2024-11-12 13:07:22.63 Server      * shcore                         0000000B0C4E0000  0000000B0C586FFF  000a7000

2024-11-12 13:07:22.63 Server      * windows.storage                0000000B0C590000  0000000B0CCD6FFF  00747000

2024-11-12 13:07:22.64 Server      * srvcli                         0000000B0CCE0000  0000000B0CD05FFF  00026000

2024-11-12 13:07:22.64 Server      * netutils                       0000000B0CD10000  0000000B0CD1DFFF  0000e000

2024-11-12 13:07:22.65 Server      * cfgmgr32                       0000000B0CD20000  0000000B0CD69FFF  0004a000

2024-11-12 13:07:22.65 Server      * cryptsp                        0000000B0CD70000  0000000B0CD8AFFF  0001b000

2024-11-12 13:07:22.65 Server      * NSI                            0000000B0CD90000  0000000B0CD97FFF  00008000

2024-11-12 13:07:22.66 Server      * WINMMBASE                      0000000B0CDA0000  0000000B0CDCCFFF  0002d000

2024-11-12 13:07:22.66 Server      * rpcrt4_NT                      0000000B0CDD0000  0000000B0CEE7FFF  00118000

2024-11-12 13:07:22.67 Server      * DPAPI                          0000000B0CEF0000  0000000B0CEF9FFF  0000a000

2024-11-12 13:07:22.67 Server      * powrprof                       0000000B0CF00000  0000000B0CF5CFFF  0005d000

2024-11-12 13:07:22.68 Server      * kernel.appcore                 0000000B0CF60000  0000000B0CF70FFF  00011000

2024-11-12 13:07:22.68 Server      * advapi32_NT                    0000000B0CF80000  0000000B0D027FFF  000a8000

2024-11-12 13:07:22.69 Server      * SAMCLI                         0000000B0D030000  0000000B0D047FFF  00018000

2024-11-12 13:07:22.69 Server      * secur32_NT                     0000000B0D050000  0000000B0D05BFFF  0000c000

2024-11-12 13:07:22.70 Server      * securityapi                    0000000B0D060000  0000000B0D0BEFFF  0005f000

2024-11-12 13:07:22.70 Server      * CRYPTBASE                      0000000B0D0C0000  0000000B0D0CBFFF  0000c000

2024-11-12 13:07:22.71 Server      * SSPICLI                        0000000B0D0D0000  0000000B0D0FEFFF  0002f000

2024-11-12 13:07:22.71 Server      * LOGONCLI                       0000000B0D100000  0000000B0D140FFF  00041000

2024-11-12 13:07:22.72 Server      * psapi                          0000000B0D1E0000  0000000B0D1E7FFF  00008000

2024-11-12 13:07:22.72 Server      * ncrypt                         0000000B0D210000  0000000B0D23BFFF  0002c000

2024-11-12 13:07:22.73 Server      * NTASN1                         0000000B0D370000  0000000B0D3ABFFF  0003c000

2024-11-12 13:07:22.73 Server      * instapi160                     0000000B0DB00000  0000000B0DB17FFF  00018000

2024-11-12 13:07:22.73 Server      * pidgenx                        0000000B0DB20000  0000000B0DC2AFFF  0010b000

2024-11-12 13:07:22.74 Server      * rsaenh                         0000000B0DC40000  0000000B0DC72FFF  00033000

2024-11-12 13:07:22.74 Server      * sqlboot                        0000000B0DC80000  0000000B0DCC4FFF  00045000

2024-11-12 13:07:22.75 Server      * imagehlp                       0000000B0E0F0000  0000000B0E10CFFF  0001d000

2024-11-12 13:07:22.75 Server      * gpapi                          0000000B0E220000  0000000B0E241FFF  00022000

2024-11-12 13:07:22.76 Server      * cryptnet                       0000000B0E450000  0000000B0E47EFFF  0002f000

2024-11-12 13:07:22.76 Server      * wkscli                         0000000B0E480000  0000000B0E497FFF  00018000

2024-11-12 13:07:22.77 Server      * cscapi                         0000000B0E4A0000  0000000B0E4B1FFF  00012000

2024-11-12 13:07:22.77 Server      * sqlevn70                       0000000B0E4C0000  0000000B0E854FFF  00395000

2024-11-12 13:07:22.78 Server      * SQLVDI                         000000073D0B0000  000000073D0EFFFF  00040000

2024-11-12 13:07:22.78 Server      * dbghelp                        000000073D530000  000000073D71CFFF  001ed000

2024-11-12 13:07:22.79 Server      *
* PROCESSOR SPECIFIC CONTEXT:
*

2024-11-12 13:07:22.79 Server      *     P1Home: 0000000000000000:  

2024-11-12 13:07:22.79 Server      *     P2Home: 0000000B00000000:  0000000000000000  FFFFFFFFFFFFFFFF  0000000B00210000  0000000B0018B220  0000000B017614E0  0000000000000000  

2024-11-12 13:07:22.80 Server      *     P3Home: 0000000000000000:  

2024-11-12 13:07:22.80 Server      *     P4Home: 0000000B004F1608:  00007F8EBC428620  000000040032E368  00007F8ECD464800  737365636F725000  74616E696D726554  0000000000000000  

2024-11-12 13:07:22.81 Server      *     P5Home: 0000000100000010:  

2024-11-12 13:07:22.81 Server      *     P6Home: 00003FFF84F7FCCE:  

2024-11-12 13:07:22.81 Server      * ContextFlags: 000000000010000F:  

2024-11-12 13:07:22.82 Server      *      MxCsr: 0000000000001FA0:  

2024-11-12 13:07:22.82 Server      *      SegCs: 0000000000000033:  

2024-11-12 13:07:22.82 Server      *      SegDs: 0000000000000000:  

2024-11-12 13:07:22.82 Server      *      SegEs: 0000000000000000:  

2024-11-12 13:07:22.83 Server      *      SegFs: 0000000000000000:  

2024-11-12 13:07:22.83 Server      *      SegGs: 0000000000000000:  

2024-11-12 13:07:22.83 Server      *      SegSs: 000000000000002B:  

2024-11-12 13:07:22.83 Server      *     EFlags: 0000000000000206:  

2024-11-12 13:07:22.84 Server      *        Rax: 00003FFF84F794E8:  

2024-11-12 13:07:22.84 Server      *        Rcx: 0000300000001CA0:  

2024-11-12 13:07:22.84 Server      *        Rdx: 0000000000000000:  

2024-11-12 13:07:22.84 Server      *        Rbx: 0000000000000000:  

2024-11-12 13:07:22.84 Server      *        Rsp: 0000000B004F1A40:  0000000B004F1A50  0000000000000000  0000000B004F3660  0000000B004F1A88  00000000000042AC  0000000000000000  

2024-11-12 13:07:22.85 Server      *        Rbp: 0000000B004F3660:  0068005F00780065  006C0064006E0061  00780065005F0065  0074007000650063  0063006E00650020  0074006E0075006F  

2024-11-12 13:07:22.85 Server      *        Rsi: 0000000B004F3660:  0068005F00780065  006C0064006E0061  00780065005F0065  0074007000650063  0063006E00650020  0074006E0075006F  

2024-11-12 13:07:22.86 Server      *        Rdi: 0000000000000000:  

2024-11-12 13:07:22.86 Server      *         R8: 00003FFF970C64B0:  

2024-11-12 13:07:22.87 Server      *         R9: 0000000000001000:  

2024-11-12 13:07:22.87 Server      *        R10: 0000000B01760000:  0000000000000000  0100763564F6FF89  00000002FFEEFFEE  0000000B0E120018  0000000B01760120  0000000B01760000  

2024-11-12 13:07:22.87 Server      *        R11: 0000000B004F1BEC:  000000000000003F  0000000000000000  0000000100000000  0000000000000001  0000000000000000  0000002A0000000B  

2024-11-12 13:07:22.88 Server      *        R12: 000000000000003F:  

2024-11-12 13:07:22.88 Server      *        R13: 0000000000000000:  

2024-11-12 13:07:22.88 Server      *        R14: 0000000000000000:  

2024-11-12 13:07:22.89 Server      *        R15: 0000000000000000:  

2024-11-12 13:07:22.89 Server      *        Rip: 0000000B01A9E3F9:  8C8B480000441F0F  CC3348000000C024  C481480004A432E8  246483C3000000D8  CCCCCCCCD0EB0038  CCCCCCCCCCCCCCCC  

2024-11-12 13:07:22.90 Server      * *******************************************************************************

2024-11-12 13:07:22.90 Server      * -------------------------------------------------------------------------------

2024-11-12 13:07:22.90 Server      * Short Stack Dump

2024-11-12 13:07:22.94 Server      0000000B01A9E3F9 Module(KERNELBASE+000000000003E3F9)

2024-11-12 13:07:22.94 Server      0000000B06662B3E Module(sqllang+0000000001332B3E)

2024-11-12 13:07:22.97 Server      0000000B06666B65 Module(sqllang+0000000001336B65)

2024-11-12 13:07:22.99 Server      0000000B0241028C Module(sqlmin+000000000022028C)

2024-11-12 13:07:23.00 Server      0000000B08C839F8 Module(sqldk+00000000000539F8)

2024-11-12 13:07:23.01 Server      0000000B01AE7B6C Module(KERNELBASE+0000000000087B6C)

2024-11-12 13:07:23.02 Server      0000000B0013D930 Module(ntdll+00000000000FD930)

2024-11-12 13:07:23.02 Server      0000000B00126F06 Module(ntdll+00000000000E6F06)

2024-11-12 13:07:23.02 Server      0000000B0013AF8F Module(ntdll+00000000000FAF8F)

2024-11-12 13:07:23.03 Server      0000000B000DB4B6 Module(ntdll+000000000009B4B6)

2024-11-12 13:07:23.03 Server      0000000B00139CDE Module(ntdll+00000000000F9CDE)

2024-11-12 13:07:23.03 Server      0000000B023FE159 Module(sqlmin+000000000020E159)

2024-11-12 13:07:23.03 Server      0000000B039A71BC Module(sqlmin+00000000017B71BC)

2024-11-12 13:07:23.05 Server      0000000B039A75AB Module(sqlmin+00000000017B75AB)

2024-11-12 13:07:23.05 Server      0000000B00221FC0 Module(sqlservr+0000000000011FC0)

2024-11-12 13:07:23.05 Server      0000000B00227D05 Module(sqlservr+0000000000017D05)

2024-11-12 13:07:23.06 Server      0000000B0021B021 Module(sqlservr+000000000000B021)

2024-11-12 13:07:23.07 Server      0000000B002639F4 Module(sqlservr+00000000000539F4)

2024-11-12 13:07:23.08 Server      0000000B015E7AF4 Module(KERNEL32+0000000000017AF4)

2024-11-12 13:07:23.08 Server      0000000B000644A1 Module(ntdll+00000000000244A1)

2024-11-12 13:07:23.08 Server      Stack Signature for the dump is 0x000000006DEC8CB5
2024-11-12 13:07:23.08 Server      Unable to create dump because SQLDUMPER library is not available.
avagin commented 1 week ago

I found this crash too. I am investigating it, but haven't have any ideas so far.

avagin commented 1 week ago

The issue occurs within a SIGSEGV signal handler, where different code paths are taken between Linux and gVisor. Here's the relevant assembly snippet:

   0x00005584ade25bbb:  mov    -0x2b0(%rbp),%rax
   0x00005584ade25bc2:  mov    0xc8(%rax),%rax
   0x00005584ade25bc9:  mov    %rax,%rcx
   0x00005584ade25bcc:  sub    $0xd,%rcx
   0x00005584ade25bd0:  je     0x5584ade25be3
   0x00005584ade25bd2:  jmp    0x5584ade25bd4
   0x00005584ade25bd4:  sub    $0xe,%rax               <----------------------------------------
   0x00005584ade25bd8:  je     0x5584ade25d16

On Linux, rax holds the value 0xe, while in gVisor it does not. Investigating the memory location where rax is loaded:

(gdb) p/x *(long *)(0x7f0ba2014900+0xc8)
$31 = 0xe
(gdb) p/x *(long *)(0x7f0ba2014900+0xc0)
$32 = 0x6
(gdb) p/x *(long *)(0x7f0ba2014900+0xb8)
$33 = 0x2b000000000033

This appears to be the sigcontext structure, as indicated by cs = 33 and ds = 2b. Therefore, 0xc8(%rax) likely accesses sigcontext->trapno, where 0xe corresponds to X86_TRAP_PF (page fault).

avagin commented 1 week ago

Here is a small reproducer for this issue:

#define _GNU_SOURCE
#include <stdio.h>
#include <signal.h>
#include <sys/ucontext.h>
#include <unistd.h>
#include <stdlib.h>

void segfault_handler(int sig_num, siginfo_t *sig_info, void *unused) {
  ucontext_t *ucontext = (ucontext_t *)unused;
  printf("Signal number: %d\n", sig_num);
  printf("Faulting address: %p\n", sig_info->si_addr);
  printf("Trap number (ucontext->uc_mcontext.gregs[REG_TRAPNO]): %ld\n",
         ucontext->uc_mcontext.gregs[REG_TRAPNO]);
  _exit(0);
}

int main() {
  struct sigaction sig_action;

  // Set up the structure to specify the new action.
  sig_action.sa_sigaction = segfault_handler;
  sigemptyset(&sig_action.sa_mask);
  sig_action.sa_flags = SA_SIGINFO;

  // Install the handler for SIGSEGV.
  if (sigaction(SIGSEGV, &sig_action, NULL) == -1) {
    perror("sigaction failed");
    return 1;
  }

  // Trigger a segmentation fault by accessing a NULL pointer.
  int *ptr = NULL;
  *ptr = 10;

  return 0;
}
avagin commented 1 week ago

sqlserver starts successfully with this patch:

diff --git a/pkg/sentry/arch/signal_amd64.go b/pkg/sentry/arch/signal_amd64.go
index 4dd7e1332..1656ede24 100644
--- a/pkg/sentry/arch/signal_amd64.go
+++ b/pkg/sentry/arch/signal_amd64.go
@@ -148,6 +148,7 @@ func (c *Context64) SignalSetup(st *Stack, act *linux.SigAction, info *linux.Sig
        // SIGBUSes.
        if linux.Signal(info.Signo) == linux.SIGSEGV || linux.Signal(info.Signo) == linux.SIGBUS {
                uc.MContext.Cr2 = info.Addr()
+               uc.MContext.Trapno = 0xe
        }

        // "... the value (%rsp+8) is always a multiple of 16 (...) when
liron-l commented 1 week ago

Thanks for the detailed analysis, 👑 @avagin, you are a star! Is this fix planned to be merged to master?