microsoft / mssql-docker

Official Microsoft repository for SQL Server in Docker resources
MIT License
1.72k stars 757 forks source link

Unable to start container on Linux 6.7 #868

Open quinnjr opened 9 months ago

quinnjr commented 9 months ago

Currently unable to start the container on Arch Linux as the host OS. The dump files for the failing sqlservr process don't really provide any insight as to why:

docker compose logs db
db-1  | SQL Server 2022 will run as non-root by default.
db-1  | This container is running as user mssql.
db-1  | Your master database file is owned by mssql.
db-1  | To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
db-1  | This program has encountered a fatal error and cannot continue running at Mon Jan 15 18:19:00 2024
db-1  | The following diagnostic information is available:
db-1  | 
db-1  |          Reason: 0x00000001
db-1  |          Signal: SIGABRT - Aborted (6)
db-1  |           Stack:
db-1  |                  IP               Function
db-1  |                  ---------------- --------------------------------------
db-1  |                  000064eb280a3ce1 std::__1::bad_function_call::~bad_function_call()+0x96661
db-1  |                  000064eb280a36a6 std::__1::bad_function_call::~bad_function_call()+0x96026
db-1  |                  000064eb280a2c2f std::__1::bad_function_call::~bad_function_call()+0x955af
db-1  |                  00007c18f8810520 __sigaction+0x50
db-1  |                  00007c18f88649fc pthread_kill+0x12c
db-1  |                  00007c18f8810476 raise+0x16
db-1  |                  00007c18f87f67f3 abort+0xd3
db-1  |                  000064eb28074d96 std::__1::bad_function_call::~bad_function_call()+0x67716
db-1  |                  000064eb280b15b4 std::__1::bad_function_call::~bad_function_call()+0xa3f34
db-1  |                  000064eb280df318 std::__1::bad_function_call::~bad_function_call()+0xd1c98
db-1  |                  000064eb280df0fa std::__1::bad_function_call::~bad_function_call()+0xd1a7a
db-1  |                  000064eb2807b20a std::__1::bad_function_call::~bad_function_call()+0x6db8a
db-1  |                  000064eb2807ae80 std::__1::bad_function_call::~bad_function_call()+0x6d800
db-1  |         Process: 10 - sqlservr
db-1  |          Thread: 157 (application thread 0x264)
db-1  |     Instance Id: 83ef72ce-1100-44c4-913c-45d0df61ae44
db-1  |        Crash Id: 05e56c63-9bd1-47db-b3d5-c1f58cebd578
db-1  |     Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
db-1  |    Distribution: Ubuntu 22.04.3 LTS
db-1  |      Processors: 32
db-1  |    Total Memory: 67119079424 bytes
db-1  |       Timestamp: Mon Jan 15 18:19:00 2024
db-1  |      Last errno: 2
db-1  | Last errno text: No such file or directory
db-1  | Capturing a dump of 10
db-1  | Successfully captured dump: /var/opt/mssql/log/core.sqlservr.1_15_2024_18_19_0.10
db-1  | Executing: /opt/mssql/bin/handle-crash.sh with parameters
db-1  |      handle-crash.sh
db-1  |      /opt/mssql/bin/sqlservr
db-1  |      10
db-1  |      /opt/mssql/bin
db-1  |      /var/opt/mssql/log/
db-1  |      
db-1  |      83ef72ce-1100-44c4-913c-45d0df61ae44
db-1  |      05e56c63-9bd1-47db-b3d5-c1f58cebd578
db-1  |      
db-1  |      /var/opt/mssql/log/core.sqlservr.1_15_2024_18_19_0.10
db-1  | 
db-1  | Ubuntu 22.04.3 LTS
db-1  | Capturing core dump and information to /var/opt/mssql/log...

Docker-compose file:

version: '3'
services:
  db:
    image: 'mcr.microsoft.com/mssql/server:2022-latest'
    environment:
      - ACCEPT_EULA=Y
      - MSSQL_SA_PASSWORD=<there would be a password here>
      - MSSQL_PID=Developer
    volumes:
      - ./logs:/var/opt/mssql/log
      - ./data:/var/opt/mssql/data
    ports:
      - 1433:1433

Docker logs and data directory are set as UID:GID 10001:10001.

erikbozic commented 9 months ago

I have the same issue. Found that it's the 6.7 kernel update. (https://github.com/microsoft/mssql-docker/issues/858#issuecomment-1892216070)

Rolling back to 6.6.10 makes it work again.

thomasvm commented 9 months ago

I experienced the same behavior today. First my existing container grew in size very quickly. I tried creating other containers but they all failed with the above message.

It took me a while to figure out that downgrading my kernel fixes the issue, but downgrading to 6.6.11 did the trick.

unlogicalcode commented 9 months ago

I can also confirm, I have the same behaviour. It works with Kernel 6.6 and with 6.7 I get a similiar Message as above.

quinnjr commented 9 months ago

I downgraded my kernel and the container now functions.

Is this limited to just this container or docker needing to update something to be compatible with the 6.7 kernel?

huestack commented 9 months ago

I have same problem running container in Podman, but the Docker container is running without any problem. I simply pulled the image sudo podman pull mcr.microsoft.com/mssql/server:2022-latest, and ran it:

sudo podman run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Str0ngPass!" -p 1433:1433 --name sql-test --hostname sql-test -d  mcr.microsoft.com/mssql/server:2022-latest

Attached is a log file. sql-test.log

LJFloor commented 9 months ago

Can confirm on Arch Linux, both the docker images for versions 2017, 2019 and 2022 and the AUR version give the same result.

Last errno text: No such file or directory

After downgrading the kernel to version 6.6.10-arch1-1 it starts successfully.

CodeKJ commented 9 months ago

I can confirm this on Nobara 39 with 6.7.0 kernel. Exactly same issue for 2017, 2019, 2022 mssql. 6.6.9 works fine.

erikbozic commented 9 months ago

It seems like this was solved in the aur repo package mssql-server: https://aur.archlinux.org/packages/mssql-server#comment-953063. However I'm still having trouble building the needed dependency to verify...

kshpytsya commented 9 months ago

For what it is worth:

running Gentoo with custom 6.7.x kernel. It looks like it fails trying to access cgroup v1 "/sys/fs/cgroup/memory/memory.limit_in_bytes". I suspect that switching to cgroup to "hybrid" would fix the issue but I am not up to rebooting my machine now.

$ docker run -it --rm -e ACCEPT_EULA=Y -e MSSQL_PID=Developer mcr.microsoft.com/mssql/server:2022-latest -- /bin/bash
sleep 1000

in another terminal, run

ps fax|less
# find pid of bash which is parent of sleep
sudo strace -o mssql.strace -f -s1000 -p <bash-in-mssql-docker>

return to the first terminal, Ctrl-C the sleep and run /opt/mssql/bin/sqlservr. Run /opt/mssql/bin/sqlservr and wait for it to crash. Go to the seconf terminal, interrupt strace.

$ grep -P '"/(proc|sys).*ENOENT' mssql.strace
9999 openat(AT_FDCWD, "/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = -1 ENOENT (No such file or directory)
ibauersachs commented 9 months ago

I think the ENOENT is not the issue, especially not /sys/fs/cgroup/memory/memory.limit_in_bytes since this doesn't exist on Kernel 6.6.13 either, and mssql runs fine there. My crashlogs on 6.7.1 showed Invalid argument / 22 / EINVAL:

This program has encountered a fatal error and cannot continue running at Mon Jan 22 18:09:17 2024
The following diagnostic information is available:

         Reason: 0x00000001
         Signal: SIGABRT - Aborted (6)
          Stack:
                 IP               Function
                 ---------------- --------------------------------------
                 0000613cdff2ace1 std::__1::bad_function_call::~bad_function_call()+0x96661
                 0000613cdff2a6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
                 0000613cdff29c2f std::__1::bad_function_call::~bad_function_call()+0x955af
                 0000753f7ee4d520 __sigaction+0x50
                 0000753f7eea19fc pthread_kill+0x12c
                 0000753f7ee4d476 raise+0x16
                 0000753f7ee337f3 abort+0xd3
                 0000613cdfefbd96 std::__1::bad_function_call::~bad_function_call()+0x67716
        Process: 10 - sqlservr
         Thread: 161 (application thread 0x278)
    Instance Id: ba778b4b-ea20-4f3c-98fa-2002d4c8e68c
       Crash Id: 3674de73-5de7-494e-8530-2520421dd97f
    Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
   Distribution: Ubuntu 22.04.3 LTS
     Processors: 16
   Total Memory: 29180137472 bytes
      Timestamp: Mon Jan 22 18:09:17 2024
     Last errno: 22
Last errno text: Invalid argument
CryptoSiD commented 8 months ago

The problem is still there with kernel 6.7.2

Green0wl commented 8 months ago

same problem on 6.7.1-arch1-1

GieltjE commented 8 months ago

As a bad side effect the lsof process it spawns starts eating a core

fbrosseau commented 8 months ago

Hello,

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7, mmap without MAP_FIXED may sometimes ignore the address hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.

Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.

vermarine commented 8 months ago

Hello,

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7, mmap without MAP_FIXED may sometimes ignore the address hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.

Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.

Thank you very much for the patch. Are there plans to also backport it to 2019?

jaddie commented 8 months ago

Just wanted to write to say I am so glad you have all written on here, I didn't even think about the fact I just upgraded my arch system, I was about to start tearing things apart this has saved me a heck of a lot of time, whilst I am here to say thank you, I can also confirm this is still happening on Arch Linux on 6.7.4

massouji82 commented 8 months ago

Hi! We are running a msql based prosject on a mac and use the image mcr.microsoft.com/mssql/server:2019-latest through Podman. Podman will not start a container with this image since the kernel was updated. How kan we revert the kernel version of the host or is there another workaround? Any help would be highly appreciated. Thanks!

johnvanham commented 8 months ago

Same issue with Fedora 39 on 6.7.2 and 6.7.3, but fine on 6.6.x and 6.5.x (in case anyone is searching for this issue and using Fedora). Looking forward to the CU @fbrosseau

zzzeek commented 8 months ago

I think MSFT should strongly consider backporting this at least to SQL Server 2019 if not even 2017 as well. As people continue to upgrade their kernels this is going to be happening on an ever larger scale to existing SQL Server linux / container installations.

kshpytsya commented 8 months ago

Thank you very much for the patch. Are there plans to also backport it to 2019?

Am I missing something? I do not see any updated Docker images for mcr.microsoft.com/mssql/server:2022-latest that would make it run on 6.7.*.

markbeazley commented 8 months ago

Thank you very much for the patch. Are there plans to also backport it to 2019? Am I missing something? I do not see any updated Docker images for mcr.microsoft.com/mssql/server:2022-latest that would make it run on 6.7.*.

It should be included in the next CU, no date estimate

The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.

I've been keeping an eye on this page for a presumably CU12 to be released.

asergios commented 8 months ago

Not working on 6.7.5 either.

brunofin commented 8 months ago

I am glad I ran into this page. This started happening recently on Fedora 39. Kernel 6.7.4. I will test another kernel and report back.

Edit: Works on 6.6.13.

daef commented 8 months ago

mysql, pgsql and sqlite all work no problem. but m$ seems to be able to afford not to give a crap about a regression in the latest kernel. not amused.

Run-c0de commented 7 months ago

─ docker logs 9536fdc556e1 ─╯ This program has encountered a fatal error and cannot continue running at Tue Feb 27 19:29:45 2024 The following diagnostic information is available:

     Reason: 0x00000001
     Signal: SIGABRT - Aborted (6)
      Stack:
             IP               Function
             ---------------- --------------------------------------
             000056dc072752fc <unknown>
             000056dc07274d42 <unknown>
             000056dc07274351 <unknown>
             00007c8fbb447090 killpg+0x40
             00007c8fbb44700b gsignal+0xcb
             00007c8fbb426859 abort+0x12b
             000056dc071fb3d2 <unknown>
             000056dc07287304 <unknown>
             000056dc072bc388 <unknown>
             000056dc072bc16a <unknown>
             000056dc0720724a <unknown>
             000056dc07206e9f <unknown>
    Process: 12 - sqlservr
     Thread: 83 (application thread 0x134)
Instance Id: 252d75bf-d3a4-4b38-a78f-b83488b53759
   Crash Id: 855b8579-9053-4856-ad38-69e4a54d6ff6
Build stamp: e149a9e980d9936d4f4a616b06112de0e7b2f4e45c2cd3a0884ae319ad3d13b7

Distribution: Ubuntu 20.04.6 LTS Processors: 12 Total Memory: 16618233856 bytes Timestamp: Tue Feb 27 19:29:45 2024 Last errno: 2 Last errno text: No such file or directory Capturing a dump of 12 Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_27_2024_19_29_45.12 Executing: /opt/mssql/bin/handle-crash.sh with parameters handle-crash.sh /opt/mssql/bin/sqlservr 12 /opt/mssql/bin /var/opt/mssql/log/

 252d75bf-d3a4-4b38-a78f-b83488b53759
 855b8579-9053-4856-ad38-69e4a54d6ff6

 /var/opt/mssql/log/core.sqlservr.2_27_2024_19_29_45.12

Ubuntu 20.04.6 LTS Capturing core dump and information to /var/opt/mssql/log... /bin/cat: /proc/12/maps: Permission denied SQL server is unavailable - sleeping

MPavleski commented 7 months ago

Any plans to upgrade the Docker image to resolve this issue?

thomasvm commented 7 months ago

Well, first there needs to be a new CU release, the last one is from january 2024 and there seems to be a pace of about 1 release per month, so a new release is about to be expected. But the team is not communicating release dates, so we can only wait at this point in time.

Keep track of this page to see whether a new CU is released.

brunofin commented 7 months ago

It's a bit infuriating that we need to wait for a critical bug fix to land on a monthly cumulative update without being even certain whether it actually will.

It would be much more productive instead to post here instructions on how to migrate the database to postgres and be done with it lol

YusufMavzer commented 7 months ago

What the hell same issue here

CryptoSiD commented 7 months ago

When can we hope for CU12 that will include the fix?

It's been 2 weeks already.

YusufMavzer commented 7 months ago

Tested it too and can confirm kernel 6.7+ doesn't work. When will it be fixed?

@erikbozic

GieltjE commented 7 months ago

When can we hope for CU12 that will include the fix?

It's been 2 weeks already.

See the release site, usualy releases in the second week of the month

YusufMavzer commented 7 months ago

When can we hope for CU12 that will include the fix?

It's been 2 weeks already.

Here you can see when the latest release was https://hub.docker.com/_/microsoft-mssql-server

rg4github commented 7 months ago

When can we hope for CU12 that will include the fix? It's been 2 weeks already.

Here you can see when the latest release was https://hub.docker.com/_/microsoft-mssql-server

Maybe I shouldn’t be viewing this on my phone, but are there release dates on this page?

alex1712 commented 7 months ago

Yes, if you scroll down you see a table with the list of tags with the latest update date, thats how I understand it at least

rg4github commented 7 months ago

Yes, if you scroll down you see a table with the list of tags with the latest update date, thats how I understand it at least

Thanks for clarifying! I opened the page on my desktop and the dates were clearly visible there. Now that I know where to look I can see the dates on my phone as well; I just have to scroll sideways on the unmarked table :-p (It’s like playing Myst, where you have to look behind opened doors…)

brunofin commented 7 months ago

Now Fedora has updated its kernel to 6.7.7 which unsurprisingly still doesn't work.

The problem now, is that now there are 3 6.7.x kernels installed meaning the 6.6.x backup kernel was effectively uninstalled automatically. So now whoever still needs to use this will need to learn how to manually install a kernel version which I am sure it's not gonna be fun.

Thanks Microsoft.

brunofin commented 7 months ago

For anyone looking to install kernel 6.6 on Fedora:

https://fedoramagazine.org/install-kernel-koji/

Here's the link for the kernel for F39: https://koji.fedoraproject.org/koji/buildinfo?buildID=2386947

Download those files:

kernel-6.6.14-200.fc39.x86_64.rpm                kernel-modules-6.6.14-200.fc39.x86_64.rpm
kernel-core-6.6.14-200.fc39.x86_64.rpm           kernel-modules-core-6.6.14-200.fc39.x86_64.rpm
kernel-devel-6.6.14-200.fc39.x86_64.rpm          kernel-modules-extra-6.6.14-200.fc39.x86_64.rpm
kernel-devel-matched-6.6.14-200.fc39.x86_64.rpm

Then use dnf to install them as the guide describes.

sudo dnf install ./kernel-6.6.14-200.fc39.x86_64.rpm ./kernel-core-6.6.14-200.fc39.x86_64.rpm ./kernel-modules-6.6.14-200.fc39.x86_64.rpm ./kernel-modules-core-6.6.14-200.fc39.x86_64.rpm ./kernel-modules-extra-6.6.14-200.fc39.x86_64.rpm ./kernel-devel-6.6.14-200.fc39.x86_64.rpm ./kernel-devel-matched-6.6.14-200.fc39.x86_64.rpm 

After that run:

sudo akmods --force --kernels 6.6.14-200.fc39.x86_64

After that you should have your kernel installed, modules built (NVIDIA etc) and it will be set as default kernel on reboot.

irrefl commented 7 months ago

Thanks @brunofin

markbeazley commented 7 months ago

Now Fedora has updated its kernel to 6.7.7 which unsurprisingly still doesn't work.

The problem now, is that now there are 3 6.7.x kernels installed meaning the 6.6.x backup kernel was effectively uninstalled automatically. So now whoever still needs to use this will need to learn how to manually install a kernel version which I am sure it's not gonna be fun.

Thanks Microsoft.

The best fix I've found for this on Fedroa (not sure what the steps are on other distros):

  1. Have kernel 6.6 installed (not sure how to do this if its already been removed I did the following before that happened on my machine)
  2. Get the list of bootable kernel paths sudo grubby --info=ALL | grep ^kernel:
    kernel="/boot/vmlinuz-6.7.6-200.fc39.x86_64"
    kernel="/boot/vmlinuz-6.7.5-200.fc39.x86_64"
    kernel="/boot/vmlinuz-6.7.4-200.fc39.x86_64"
    kernel="/boot/vmlinuz-6.6.14-200.fc39.x86_64"
    kernel="/boot/vmlinuz-0-rescue-b7048036f6894f74977406c4d08c3213"
  3. Set your default Kernel to 6.6 in grub using the path returned in the previous command in my case that would be: sudo grubby --set-default=/boot/vmlinuz-6.6.14-200.fc39.x86_64
  4. Reboot and check you are running Kernel 6.6 uname -a
  5. As root edit /etc/sysconfig/kernel and change UPDATEDEFAULT=yes to UPDATEDEFAULT=no

What this will do is make sure your default boot kernel is 6.6, then the last change stops any kernel updates from changing your default boot kernel, and because the update won't remove the currently running kernel as long as you are booted into a 6.6 kernel when the update runs it wont be removed, once 6.7 is supported you can just undo that last change and set your default back to a 6.7 kernel.

This is unfortuantly the kind of issue you are likely to run into when you are running a distro that tries to use the latest versions, the 6.6 kernel is a LTS release, 6.7 was only released in January.

Its for production workloads but they do list supported configurations, for running SQL Server in a docker container, here and here on that list RHEL 9 and SUSE are still on the 5.14 kernel and Ubuntu 22.04 is on 6.5.

johnvanham commented 7 months ago

Now Fedora has updated its kernel to 6.7.7 which unsurprisingly still doesn't work.

The problem now, is that now there are 3 6.7.x kernels installed meaning the 6.6.x backup kernel was effectively uninstalled automatically. So now whoever still needs to use this will need to learn how to manually install a kernel version which I am sure it's not gonna be fun.

Thanks Microsoft.

The best fix I've found for this on Fedroa (not sure what the steps are on other distros):

  1. Have kernel 6.6 installed (not sure how to do this if its already been removed I did the following before that happened on my machine)

What I've done is with 6.6 still installed, edited /etc/dnf/dnf.conf, and changed the value of installonly_limit to 0. This means my fedora machine is adding more kernels automatically but not removing the last 6.6 version. I can manually remove older 6.7.x kernels so I then have a choice in grub to still boot into 6.6.x or 6.7.x

Once this is fixed I'll set that back to 3 again.

fabiang commented 7 months ago

For those who update their Fedora system through cli, you can exclude updating the kernel with dnf up -x 'kernel*'

d4r1us-drk commented 7 months ago

In fedora I just followed these instructions using koji to downgrade the kernel to 6.6.14 https://discussion.fedoraproject.org/t/downgrading-to-a-previous-kernel-version/72820

specifically this one command (as root, and make sure to have koji installed using sudo dnf install koji):

cd $(mktemp -d) && koji download-build --arch=x86_64 --arch=noarch kernel-6.6.14-200.fc39 && dnf upgrade *
w-ko commented 7 months ago

The image has been updated: 2022-latest amd64 No Dockerfile Ubuntu 22.04 05/31/2022 03/14/2024 latest amd64 No Dockerfile Ubuntu 22.04 09/21/2018 03/14/2024

It works for me again on Fedora 6.7.9-200.fc39.x86_64

fbrosseau commented 7 months ago

Hello,

Yes, sql22cu12 shipped today and it has the fix. The reason fixes are usually blurry on delivery dates and/or CU numbers is that schedules can shift (such as for making room for an urgent security fix, etc). SQL Server Linux follows the release cadence of SQL Server as a whole, and this schedule is usually quite rigid (minus security fixes).

Sql19 should also be fixed for this in its next CU - their release schedules typically alternate one each month. However, sql17 will not be fixed, as sql17 is out of mainstream support and only receives security fixes. No bug fixes qualify for sql17. Customers who must remain on sql17 should keep kernel 6.6 or lower, although as usual we strongly recommend upgrading to a supported version of SQL Server for Linux, for many reasons including continued bugfixing.

lateparty commented 7 months ago

The merge is great news, was tracking that overnight. Any rough eta from merge to release available to pull down through the Bitwarden.sh? I gave it a shot about an hour ago and no luck yet

Drezir commented 7 months ago

Using Fedora 39, only tag 2022-CU12-ubuntu-22.04 works for me, not latest or 2022-latest.

felixSabatie commented 7 months ago

2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64

Drezir commented 7 months ago

2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64

I agree, I had to delete locally cached version :)

andros0689 commented 7 months ago

2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64

Same here. Thanks a lot.