Open quinnjr opened 9 months ago
I have the same issue. Found that it's the 6.7 kernel update. (https://github.com/microsoft/mssql-docker/issues/858#issuecomment-1892216070)
Rolling back to 6.6.10 makes it work again.
I experienced the same behavior today. First my existing container grew in size very quickly. I tried creating other containers but they all failed with the above message.
It took me a while to figure out that downgrading my kernel fixes the issue, but downgrading to 6.6.11 did the trick.
I can also confirm, I have the same behaviour. It works with Kernel 6.6 and with 6.7 I get a similiar Message as above.
I downgraded my kernel and the container now functions.
Is this limited to just this container or docker needing to update something to be compatible with the 6.7 kernel?
I have same problem running container in Podman, but the Docker container is running without any problem. I simply pulled the image sudo podman pull mcr.microsoft.com/mssql/server:2022-latest
, and ran it:
sudo podman run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Str0ngPass!" -p 1433:1433 --name sql-test --hostname sql-test -d mcr.microsoft.com/mssql/server:2022-latest
Attached is a log file. sql-test.log
Can confirm on Arch Linux, both the docker images for versions 2017, 2019 and 2022 and the AUR version give the same result.
Last errno text: No such file or directory
After downgrading the kernel to version 6.6.10-arch1-1
it starts successfully.
I can confirm this on Nobara 39 with 6.7.0 kernel. Exactly same issue for 2017, 2019, 2022 mssql. 6.6.9 works fine.
It seems like this was solved in the aur repo package mssql-server
: https://aur.archlinux.org/packages/mssql-server#comment-953063.
However I'm still having trouble building the needed dependency to verify...
For what it is worth:
running Gentoo with custom 6.7.x kernel. It looks like it fails trying to access cgroup v1 "/sys/fs/cgroup/memory/memory.limit_in_bytes". I suspect that switching to cgroup to "hybrid" would fix the issue but I am not up to rebooting my machine now.
$ docker run -it --rm -e ACCEPT_EULA=Y -e MSSQL_PID=Developer mcr.microsoft.com/mssql/server:2022-latest -- /bin/bash
sleep 1000
in another terminal, run
ps fax|less
# find pid of bash which is parent of sleep
sudo strace -o mssql.strace -f -s1000 -p <bash-in-mssql-docker>
return to the first terminal, Ctrl-C
the sleep and run /opt/mssql/bin/sqlservr
. Run /opt/mssql/bin/sqlservr
and wait for it to crash. Go to the seconf terminal, interrupt strace.
$ grep -P '"/(proc|sys).*ENOENT' mssql.strace
9999 openat(AT_FDCWD, "/sys/fs/cgroup/memory/memory.limit_in_bytes", O_RDONLY) = -1 ENOENT (No such file or directory)
I think the ENOENT
is not the issue, especially not /sys/fs/cgroup/memory/memory.limit_in_bytes
since this doesn't exist on Kernel 6.6.13 either, and mssql runs fine there.
My crashlogs on 6.7.1 showed Invalid argument / 22 / EINVAL:
This program has encountered a fatal error and cannot continue running at Mon Jan 22 18:09:17 2024
The following diagnostic information is available:
Reason: 0x00000001
Signal: SIGABRT - Aborted (6)
Stack:
IP Function
---------------- --------------------------------------
0000613cdff2ace1 std::__1::bad_function_call::~bad_function_call()+0x96661
0000613cdff2a6a6 std::__1::bad_function_call::~bad_function_call()+0x96026
0000613cdff29c2f std::__1::bad_function_call::~bad_function_call()+0x955af
0000753f7ee4d520 __sigaction+0x50
0000753f7eea19fc pthread_kill+0x12c
0000753f7ee4d476 raise+0x16
0000753f7ee337f3 abort+0xd3
0000613cdfefbd96 std::__1::bad_function_call::~bad_function_call()+0x67716
Process: 10 - sqlservr
Thread: 161 (application thread 0x278)
Instance Id: ba778b4b-ea20-4f3c-98fa-2002d4c8e68c
Crash Id: 3674de73-5de7-494e-8530-2520421dd97f
Build stamp: a9299dd605c652a3cea4246273441bcfaf48afb4b482ab9dc43771eecaf6600b
Distribution: Ubuntu 22.04.3 LTS
Processors: 16
Total Memory: 29180137472 bytes
Timestamp: Mon Jan 22 18:09:17 2024
Last errno: 22
Last errno text: Invalid argument
The problem is still there with kernel 6.7.2
same problem on 6.7.1-arch1-1
As a bad side effect the lsof process it spawns starts eating a core
Hello,
The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.
It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7, mmap
without MAP_FIXED
may sometimes ignore the address
hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.
Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.
Hello,
The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.
It is unrelated to cgroups, and at first glance it might be a kernel bug (but do not quote me on this) - it appears that as of 6.7,
mmap
withoutMAP_FIXED
may sometimes ignore theaddress
hint even if the hinted region is in fact available. I have not investigated the kernel side of things further, but I think it might be related to this series of changes and/or its preceding/following changes.Knowing this, I cannot think of any workaround other than sticking to 6.6 in the meantime.
Thank you very much for the patch. Are there plans to also backport it to 2019?
Just wanted to write to say I am so glad you have all written on here, I didn't even think about the fact I just upgraded my arch system, I was about to start tearing things apart this has saved me a heck of a lot of time, whilst I am here to say thank you, I can also confirm this is still happening on Arch Linux on 6.7.4
Hi! We are running a msql based prosject on a mac and use the image mcr.microsoft.com/mssql/server:2019-latest
through Podman. Podman will not start a container with this image since the kernel was updated. How kan we revert the kernel version of the host or is there another workaround? Any help would be highly appreciated. Thanks!
Same issue with Fedora 39 on 6.7.2 and 6.7.3, but fine on 6.6.x and 6.5.x (in case anyone is searching for this issue and using Fedora). Looking forward to the CU @fbrosseau
I think MSFT should strongly consider backporting this at least to SQL Server 2019 if not even 2017 as well. As people continue to upgrade their kernels this is going to be happening on an ever larger scale to existing SQL Server linux / container installations.
Thank you very much for the patch. Are there plans to also backport it to 2019?
Am I missing something? I do not see any updated Docker images for mcr.microsoft.com/mssql/server:2022-latest
that would make it run on 6.7.*
.
Thank you very much for the patch. Are there plans to also backport it to 2019? Am I missing something? I do not see any updated Docker images for
mcr.microsoft.com/mssql/server:2022-latest
that would make it run on6.7.*
.
It should be included in the next CU, no date estimate
The issue has been identified and should be fixed in the next SQL Server 2022 CU, but we cannot commit to a specific CU release or timeline as sometimes plans can unexpectedly change. No further investigation or data points should be needed for this, but thank you all for reporting and for looking for potential causes.
I've been keeping an eye on this page for a presumably CU12 to be released.
Not working on 6.7.5
either.
I am glad I ran into this page. This started happening recently on Fedora 39. Kernel 6.7.4. I will test another kernel and report back.
Edit: Works on 6.6.13.
mysql, pgsql and sqlite all work no problem. but m$ seems to be able to afford not to give a crap about a regression in the latest kernel. not amused.
─ docker logs 9536fdc556e1 ─╯ This program has encountered a fatal error and cannot continue running at Tue Feb 27 19:29:45 2024 The following diagnostic information is available:
Reason: 0x00000001
Signal: SIGABRT - Aborted (6)
Stack:
IP Function
---------------- --------------------------------------
000056dc072752fc <unknown>
000056dc07274d42 <unknown>
000056dc07274351 <unknown>
00007c8fbb447090 killpg+0x40
00007c8fbb44700b gsignal+0xcb
00007c8fbb426859 abort+0x12b
000056dc071fb3d2 <unknown>
000056dc07287304 <unknown>
000056dc072bc388 <unknown>
000056dc072bc16a <unknown>
000056dc0720724a <unknown>
000056dc07206e9f <unknown>
Process: 12 - sqlservr
Thread: 83 (application thread 0x134)
Instance Id: 252d75bf-d3a4-4b38-a78f-b83488b53759
Crash Id: 855b8579-9053-4856-ad38-69e4a54d6ff6
Build stamp: e149a9e980d9936d4f4a616b06112de0e7b2f4e45c2cd3a0884ae319ad3d13b7
Distribution: Ubuntu 20.04.6 LTS Processors: 12 Total Memory: 16618233856 bytes Timestamp: Tue Feb 27 19:29:45 2024 Last errno: 2 Last errno text: No such file or directory Capturing a dump of 12 Successfully captured dump: /var/opt/mssql/log/core.sqlservr.2_27_2024_19_29_45.12 Executing: /opt/mssql/bin/handle-crash.sh with parameters handle-crash.sh /opt/mssql/bin/sqlservr 12 /opt/mssql/bin /var/opt/mssql/log/
252d75bf-d3a4-4b38-a78f-b83488b53759
855b8579-9053-4856-ad38-69e4a54d6ff6
/var/opt/mssql/log/core.sqlservr.2_27_2024_19_29_45.12
Ubuntu 20.04.6 LTS Capturing core dump and information to /var/opt/mssql/log... /bin/cat: /proc/12/maps: Permission denied SQL server is unavailable - sleeping
Any plans to upgrade the Docker image to resolve this issue?
Well, first there needs to be a new CU release, the last one is from january 2024 and there seems to be a pace of about 1 release per month, so a new release is about to be expected. But the team is not communicating release dates, so we can only wait at this point in time.
Keep track of this page to see whether a new CU is released.
It's a bit infuriating that we need to wait for a critical bug fix to land on a monthly cumulative update without being even certain whether it actually will.
It would be much more productive instead to post here instructions on how to migrate the database to postgres and be done with it lol
What the hell same issue here
When can we hope for CU12 that will include the fix?
It's been 2 weeks already.
Tested it too and can confirm kernel 6.7+ doesn't work. When will it be fixed?
@erikbozic
When can we hope for CU12 that will include the fix?
It's been 2 weeks already.
See the release site, usualy releases in the second week of the month
When can we hope for CU12 that will include the fix?
It's been 2 weeks already.
Here you can see when the latest release was https://hub.docker.com/_/microsoft-mssql-server
When can we hope for CU12 that will include the fix? It's been 2 weeks already.
Here you can see when the latest release was https://hub.docker.com/_/microsoft-mssql-server
Maybe I shouldn’t be viewing this on my phone, but are there release dates on this page?
Yes, if you scroll down you see a table with the list of tags with the latest update date, thats how I understand it at least
Yes, if you scroll down you see a table with the list of tags with the latest update date, thats how I understand it at least
Thanks for clarifying! I opened the page on my desktop and the dates were clearly visible there. Now that I know where to look I can see the dates on my phone as well; I just have to scroll sideways on the unmarked table :-p (It’s like playing Myst, where you have to look behind opened doors…)
Now Fedora has updated its kernel to 6.7.7 which unsurprisingly still doesn't work.
The problem now, is that now there are 3 6.7.x
kernels installed meaning the 6.6.x
backup kernel was effectively uninstalled automatically. So now whoever still needs to use this will need to learn how to manually install a kernel version which I am sure it's not gonna be fun.
Thanks Microsoft.
For anyone looking to install kernel 6.6 on Fedora:
https://fedoramagazine.org/install-kernel-koji/
Here's the link for the kernel for F39: https://koji.fedoraproject.org/koji/buildinfo?buildID=2386947
Download those files:
kernel-6.6.14-200.fc39.x86_64.rpm kernel-modules-6.6.14-200.fc39.x86_64.rpm
kernel-core-6.6.14-200.fc39.x86_64.rpm kernel-modules-core-6.6.14-200.fc39.x86_64.rpm
kernel-devel-6.6.14-200.fc39.x86_64.rpm kernel-modules-extra-6.6.14-200.fc39.x86_64.rpm
kernel-devel-matched-6.6.14-200.fc39.x86_64.rpm
Then use dnf to install them as the guide describes.
sudo dnf install ./kernel-6.6.14-200.fc39.x86_64.rpm ./kernel-core-6.6.14-200.fc39.x86_64.rpm ./kernel-modules-6.6.14-200.fc39.x86_64.rpm ./kernel-modules-core-6.6.14-200.fc39.x86_64.rpm ./kernel-modules-extra-6.6.14-200.fc39.x86_64.rpm ./kernel-devel-6.6.14-200.fc39.x86_64.rpm ./kernel-devel-matched-6.6.14-200.fc39.x86_64.rpm
After that run:
sudo akmods --force --kernels 6.6.14-200.fc39.x86_64
After that you should have your kernel installed, modules built (NVIDIA etc) and it will be set as default kernel on reboot.
Thanks @brunofin
Now Fedora has updated its kernel to 6.7.7 which unsurprisingly still doesn't work.
The problem now, is that now there are 3
6.7.x
kernels installed meaning the6.6.x
backup kernel was effectively uninstalled automatically. So now whoever still needs to use this will need to learn how to manually install a kernel version which I am sure it's not gonna be fun.Thanks Microsoft.
The best fix I've found for this on Fedroa (not sure what the steps are on other distros):
sudo grubby --info=ALL | grep ^kernel
:
kernel="/boot/vmlinuz-6.7.6-200.fc39.x86_64"
kernel="/boot/vmlinuz-6.7.5-200.fc39.x86_64"
kernel="/boot/vmlinuz-6.7.4-200.fc39.x86_64"
kernel="/boot/vmlinuz-6.6.14-200.fc39.x86_64"
kernel="/boot/vmlinuz-0-rescue-b7048036f6894f74977406c4d08c3213"
sudo grubby --set-default=/boot/vmlinuz-6.6.14-200.fc39.x86_64
uname -a
/etc/sysconfig/kernel
and change UPDATEDEFAULT=yes
to UPDATEDEFAULT=no
What this will do is make sure your default boot kernel is 6.6, then the last change stops any kernel updates from changing your default boot kernel, and because the update won't remove the currently running kernel as long as you are booted into a 6.6 kernel when the update runs it wont be removed, once 6.7 is supported you can just undo that last change and set your default back to a 6.7 kernel.
This is unfortuantly the kind of issue you are likely to run into when you are running a distro that tries to use the latest versions, the 6.6 kernel is a LTS release, 6.7 was only released in January.
Its for production workloads but they do list supported configurations, for running SQL Server in a docker container, here and here on that list RHEL 9 and SUSE are still on the 5.14 kernel and Ubuntu 22.04 is on 6.5.
Now Fedora has updated its kernel to 6.7.7 which unsurprisingly still doesn't work.
The problem now, is that now there are 3
6.7.x
kernels installed meaning the6.6.x
backup kernel was effectively uninstalled automatically. So now whoever still needs to use this will need to learn how to manually install a kernel version which I am sure it's not gonna be fun.Thanks Microsoft.
The best fix I've found for this on Fedroa (not sure what the steps are on other distros):
- Have kernel 6.6 installed (not sure how to do this if its already been removed I did the following before that happened on my machine)
What I've done is with 6.6 still installed, edited /etc/dnf/dnf.conf
, and changed the value of installonly_limit
to 0. This means my fedora machine is adding more kernels automatically but not removing the last 6.6 version. I can manually remove older 6.7.x kernels so I then have a choice in grub to still boot into 6.6.x or 6.7.x
Once this is fixed I'll set that back to 3 again.
For those who update their Fedora system through cli, you can exclude updating the kernel with dnf up -x 'kernel*'
In fedora I just followed these instructions using koji
to downgrade the kernel to 6.6.14
https://discussion.fedoraproject.org/t/downgrading-to-a-previous-kernel-version/72820
specifically this one command (as root, and make sure to have koji
installed using sudo dnf install koji
):
cd $(mktemp -d) && koji download-build --arch=x86_64 --arch=noarch kernel-6.6.14-200.fc39 && dnf upgrade *
The image has been updated: 2022-latest amd64 No Dockerfile Ubuntu 22.04 05/31/2022 03/14/2024 latest amd64 No Dockerfile Ubuntu 22.04 09/21/2018 03/14/2024
It works for me again on Fedora 6.7.9-200.fc39.x86_64
Hello,
Yes, sql22cu12 shipped today and it has the fix. The reason fixes are usually blurry on delivery dates and/or CU numbers is that schedules can shift (such as for making room for an urgent security fix, etc). SQL Server Linux follows the release cadence of SQL Server as a whole, and this schedule is usually quite rigid (minus security fixes).
Sql19 should also be fixed for this in its next CU - their release schedules typically alternate one each month. However, sql17 will not be fixed, as sql17 is out of mainstream support and only receives security fixes. No bug fixes qualify for sql17. Customers who must remain on sql17 should keep kernel 6.6 or lower, although as usual we strongly recommend upgrading to a supported version of SQL Server for Linux, for many reasons including continued bugfixing.
The merge is great news, was tracking that overnight. Any rough eta from merge to release available to pull down through the Bitwarden.sh? I gave it a shot about an hour ago and no luck yet
Using Fedora 39, only tag 2022-CU12-ubuntu-22.04 works for me, not latest or 2022-latest.
2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64
2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64
I agree, I had to delete locally cached version :)
2022-latest works fine for me on Fedora 39 with kernel v 6.7.9-200.fc39.x86_64
Same here. Thanks a lot.
Currently unable to start the container on Arch Linux as the host OS. The dump files for the failing sqlservr process don't really provide any insight as to why:
Docker-compose file:
Docker logs and data directory are set as UID:GID 10001:10001.