microsoft / mssql-docker

Official Microsoft repository for SQL Server in Docker resources
MIT License
1.72k stars 758 forks source link

Optimizing SQL server performance in Docker container #644

Open vernjan opened 4 years ago

vernjan commented 4 years ago

We are having an issue with a very slow start (and generally poor performance) of dockerized SQL server. Sometimes it takes 5+ minutes for a fresh container to start.

We are using SQL server in docker for integration tests during our CI build and everything is running in a Kubernetes cluster.

I think the root cause is that SQL server detects the "bare metal" CPU and does not respect the fact that it's running in a container with restricted resources:

2020-09-22 13:41:39.90 Server      SQL Server detected 12 sockets with 2 cores per socket and 2 logical processors per socket, 24 total logical processors; using 24 logical processors based on SQL Server licensing. This is an informational message; no user action is required

However, this is not correct. We allow only a small portion of CPU (currently 2.5 CPU). I read that number of logical processors has impact on number of worker threads.

I didn't find any param how to change the number of worker threads or the number of logical processors dedicated for the SQL server. The only param I found is for memory (MSSQL_MEMORY_LIMIT_MB). I'd like to do something similar for the CPU.

Does anyone have an idea how to run a SQL server in Docker with limited resources? Let's say 2 Gi of RAM and 2 CPUs at max (we need to scale our builds)? Is it even possible?

Thanks for any ideas!

Hooolm commented 3 years ago

Bump for notifications.

sidowsk1 commented 3 years ago

Bump, also for notifications

nocentino commented 3 years ago

When running with CPU limits, any process running inside a container will see all of the CPUs but the underlying Linux kernel will ensure that the process is only consuming up to the number of CPUs limited. You will see this workload across all CPUs on the system...but usage will not exceed the limit. Even with CPU Sets, the process in the container will see all the CPUs, the workload will be scheduled to the CPUs defined in the CPU set. Yes, SQL Server will see all the CPUs.

In my experience with the SQL Server container, the primary culprit of a slow startup time of that magnitude is having larger database as part of the container image. Is that the case here?

vernjan commented 3 years ago

Hi, we have discovered that the slow startup issue was in our case caused by poor disk IO performance (shared storage, time to time it was overloaded).

Detecting incorrect number of CPUs might still be an issue but not in this case.

ipg-informatik commented 3 years ago

Same here. We run a mssql linux 2019-cu9 container in aks and a simple sqlquery with 12'000 inserts takes 60-80 seconds to run. On an old onprem machine with standard 2019 sql installation it takes 2-3 seconds.

amvin87-zz commented 3 years ago

I am in sync with the comments from @nocentino. Using options like --cpuset-cpus , SQL still is able to see all the CPUs but will not be able to use them as Linux kernel will ensure that container is able to use only the number CPUs that is limited by the --cpuset-cpus option. For the performance issue, can you collect data and troubleshoot this like any other performance issue to understand why a insert query is taking long, one obvious reason I can think of is the storage that is being used for containers. But before you go down that path, I would suggest troubleshooting the performance issue like you would for any SQL issue and identify the waits or any other bottleneck and then troubleshoot accordingly.

ipg-informatik commented 3 years ago

You can follow microsofts own sql aks tutorial and you will see the performance issues [1]. You will see the same performance issues when you run local docker desktop with kubernetes and use the same YAML [1]. But a windows vm on the same machine with standard sql 2019 installation ist much faster to execute the query.

[1] https://docs.microsoft.com/en-us/sql/linux/tutorial-sql-server-containers-kubernetes?view=sql-server-ver15

This simple query in a container needs 60-80 seconds to execute.

CREATE DATABASE performance_test

USE performance_test

CREATE Table tblAuthors
(
   Id int identity primary key,
   Author_name nvarchar(50),
   country nvarchar(50)
)
CREATE Table tblBooks
(
   Id int identity primary key,
   Auhthor_id int foreign key references tblAuthors(Id),
   Price int,
   Edition int
)

Declare @Id int
Set @Id = 1

While @Id <= 12000
Begin 
   Insert Into tblAuthors values ('Author - ' + CAST(@Id as nvarchar(10)),
              'Country - ' + CAST(@Id as nvarchar(10)) + ' name')
   Print @Id
   Set @Id = @Id + 1
End

select * from tblAuthors
amvin87-zz commented 3 years ago

have you tried following the applicable performance guidelines for SQL on Linux as documented here: For SQL containers you can try a tuned profile of throughput and see if you see any improvement.

ipg-informatik commented 3 years ago

Not yet, but we tried the following, without any performance improvements.

/opt/mssql/bin/mssql-conf traceflag 3979 on
/opt/mssql/bin/mssql-conf set control.alternatewritethrough 0
/opt/mssql/bin/mssql-conf set control.writethrough 1
amvin87-zz commented 3 years ago

To use those settings, you need to ensure that you are not using ephemeral storage for SQL containers, and also the below conditions should be true :

  1. Using SQL Server 2017 CU6 or newer, or SQL Server 2019
  2. Using a Linux distribution and version that supports FUA capability (Red Hat Enterprise Linux 8.0 or higher, or SUSE Linux Enterprise Server 12 SP5)
  3. On storage subsystem and/or hardware that supports and is configured for FUA capability

Else, the default option might be better. If you still do face issues with performance, please reach out to Microsoft support for further help as SQL containers and SQL on Linux is a support feature.

Nefcanto commented 2 years ago

It's a shame that Microsoft can't deliver a good quality database.

Here's an image of resource usage of an instance of SQL Server inside docker:

Screenshot from 2021-10-14 20-52-19

And what it has? Nothing. Just newly created databases. No data at all. I mean 0 records. ZERO RECORDS and up to 80% of CPU usage? Are you kidding me?

Fix this Microsoft guys. Just fix this. If you can't simply create a new database engine from scratch.

And here's the docker-compose.yml behind this container:

version: "3.9"
services:
    database:
        image: mcr.microsoft.com/mssql/server
        user: root
        restart: always
        container_name: ContainerNameHere
        volumes:
            - /Organization/Databases:/var/opt/mssql/data
        environment:
            - ACCEPT_EULA=Y
            - SA_PASSWORD=password_here
            - MSSQL_PID=Express
        ports:
            - 1433:1433
        logging:
            driver: none
        networks:
            - NetworkNameHere
networks:
    NetworkNameHere:
        name: NetworkNameHere
        driver: bridge
amvin87-zz commented 2 years ago

are you seeing any specific issues, from the snapshot above I see the load for the last 1, 5 and 15 min is not even 1 on avg. Overall the CPU is idle at about 87%. Are you seeing any specific slowness in executing queries or any other performance issue. If so, please be specific so we could help

Nefcanto commented 2 years ago

How about now? I just created some empty databases (only schema, no data) and I deployed a couple of .NET APIs inside docker.

Screenshot from 2021-10-20 09-56-10

See how the load average is now at almost 6.

What specific information can I give you? Does it suffice to say that server crashed? A Debian 10 server only running docker and nginx crashed because of an empty SQL Server docker, with 4 GB of RAM (yes it's a small amount of RAM, but we're talking about choosing a technology for microservices to deploy SQL Server over many small VPS instances), with 2 CPU cores.

Please tell me what information do you need, and I will provide it for you.

mvorisek commented 1 year ago

I am writing SQL server CI testing for https://github.com/php/php-src. where I install SQL server using choco it takes about 5 minutes to startup as well.

Github Actions repro code:

    steps:
      - name: Setup SQL server
        run: |
          choco install sql-server-express -y --install-arguments="/SECURITYMODE=SQL /SAPWD=mssql_Pass11"
          sqlcmd -S "(local)\SQLEXPRESS" -U sa -P "mssql_Pass11" -Q "SELECT 1;"

(the 2nd command is there only to test the install, almost all time is spent in the install command, download is fast as well)

I would expect the startup time to be no more than 30 seconds.

Nefcanto commented 1 year ago

@mvorisek I'm glad SQL Server was slow. I was forced to migrate to another RDBMS, and now I'm completely happy. I chose MariaDB.

It was hard at first after 16 years of development with SQL Server. But it was worth it.

mvorisek commented 1 year ago

😆 PHP is language interpreter like python and every major DB vendor is supported, I personally use MySQL as well, but this issue is to make SQL Server startup faster as PHP CI pipeline will be then faster 🚀

tracker1 commented 1 year ago

There's no mention of the host OS from the original poster. the path for the volume would indicate a *nix environment... Would point out that I/O to/from windows and mac host environments are sluggish to say the least... best to use an actual volume container if you aren't running from a Linux (or WSL) host environment.

As to limiting CPU resources, that would/should be part of your docker/kubernetes configuration in practice iirc.

Not a big fan of mySQL/MariaDB myself... I'd go for PostgreSQL as a first choice, it's popular, sane, portable and every cloud provider supports it in a hosted model (even most second tier clouds). Every time I've used mysql, it pisses me off in one way or another, from utf8 not being utf8 for starters. For even being able to have case-insensitive collation on an index for a BINARY field. It just irks me.

p10tyr commented 3 months ago

I've had docker on Linux running on test for 18 months on Windows server 2016 and HyperV Ubuntu with Docker - Several layers of virtualisation and it ran just fine box standard on a 10 year old server.

So I got a deal on a nice modern server in OVH, installed Debian (because I wanted ZFS)

Migrated everything I had in my lab over to OVH and for literally 76 days everything was running smoothly. Until now. Where queries that took 10 seconds max now take 6 minutes ?!?!

Don't have a lot in the SQL to be honest - 2 databases of which 1 gets daily usage. This is htop with nobody doing anything on MSSQL

image

image