Closed luomiao closed 6 years ago
Check the commit history here https://github.com/dperson/samba/commits/master On Oct.21, some change is made to use Alpine as the base image (before that change, the base image is based on debian). I also did some test and verified that this change causes the samba server take longer time (1 minutes) to start.
I manually started a samba service with image based on alpline and debian.
image based on debian
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol3_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ma48oovvwbgf testvol3_service.1 lipingxue/samba-debian:latest sc-rdops-vm18-dhcp-57-89 Running Preparing 19 seconds ago
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol3_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ma48oovvwbgf testvol3_service.1 lipingxue/samba-debian:latest sc-rdops-vm18-dhcp-57-89 Running Preparing 27 seconds ago
root@sc-rdops-vm18-dhcp-57-89:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
gizbw0jq64fn testvol1 replicated 1/1 dperson/samba:latest *:10445->445/tcp
bulughgw2nup testvol2_service replicated 1/1 lipingxue/samba:latest *:10446->446/tcp
i0kig1efb9e2 testvol3_service replicated 0/1 lipingxue/samba-debian:latest *:10447->447/tcp
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol3_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ma48oovvwbgf testvol3_service.1 lipingxue/samba-debian:latest sc-rdops-vm18-dhcp-57-89 Running Starting 1 second ago
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
gizbw0jq64fn testvol1 replicated 1/1 dperson/samba:latest *:10445->445/tcp
bulughgw2nup testvol2_service replicated 1/1 lipingxue/samba:latest *:10446->446/tcp
i0kig1efb9e2 testvol3_service replicated 1/1 lipingxue/samba-debian:latest *:10447->447/tcp
We can see that the service is in "preparing state" for 27 second and then in "starting state" for 1 second and then go to the "running state".
image based on alpine
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol4_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
il7oooauxz7h testvol4_service.1 lipingxue/samba:latest sc-rdops-vm18-dhcp-57-89 Running Starting 17 seconds ago
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
gizbw0jq64fn testvol1 replicated 1/1 dperson/samba:latest *:10445->445/tcp
bulughgw2nup testvol2_service replicated 1/1 lipingxue/samba:latest *:10446->446/tcp
i0kig1efb9e2 testvol3_service replicated 1/1 lipingxue/samba-debian:latest *:10447->447/tcp
szpco6zxlct9 testvol4_service replicated 0/1 lipingxue/samba:latest *:10448->448/tcp
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol4_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
il7oooauxz7h testvol4_service.1 lipingxue/samba:latest sc-rdops-vm18-dhcp-57-89 Running Starting 42 seconds ago
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
gizbw0jq64fn testvol1 replicated 1/1 dperson/samba:latest *:10445->445/tcp
bulughgw2nup testvol2_service replicated 1/1 lipingxue/samba:latest *:10446->446/tcp
i0kig1efb9e2 testvol3_service replicated 1/1 lipingxue/samba-debian:latest *:10447->447/tcp
szpco6zxlct9 testvol4_service replicated 0/1 lipingxue/samba:latest *:10448->448/tcp
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol4_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
il7oooauxz7h testvol4_service.1 lipingxue/samba:latest sc-rdops-vm18-dhcp-57-89 Running Starting 49 seconds ago
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
gizbw0jq64fn testvol1 replicated 1/1 dperson/samba:latest *:10445->445/tcp
bulughgw2nup testvol2_service replicated 1/1 lipingxue/samba:latest *:10446->446/tcp
i0kig1efb9e2 testvol3_service replicated 1/1 lipingxue/samba-debian:latest *:10447->447/tcp
szpco6zxlct9 testvol4_service replicated 0/1 lipingxue/samba:latest *:10448->448/tcp
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol4_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
il7oooauxz7h testvol4_service.1 lipingxue/samba:latest sc-rdops-vm18-dhcp-57-89 Running Starting 56 seconds ago
root@sc-rdops-vm18-dhcp-57-89:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
gizbw0jq64fn testvol1 replicated 1/1 dperson/samba:latest *:10445->445/tcp
bulughgw2nup testvol2_service replicated 1/1 lipingxue/samba:latest *:10446->446/tcp
i0kig1efb9e2 testvol3_service replicated 1/1 lipingxue/samba-debian:latest *:10447->447/tcp
szpco6zxlct9 testvol4_service replicated 0/1 lipingxue/samba:latest *:10448->448/tcp
root@sc-rdops-vm18-dhcp-57-89:~# docker service ps testvol4_service
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
il7oooauxz7h testvol4_service.1 lipingxue/samba:latest sc-rdops-vm18-dhcp-57-89 Running Running less than a second ago
root@sc-rdops-vm18-dhcp-57-89:~#
root@sc-rdops-vm18-dhcp-57-89:~# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
gizbw0jq64fn testvol1 replicated 1/1 dperson/samba:latest *:10445->445/tcp
bulughgw2nup testvol2_service replicated 1/1 lipingxue/samba:latest *:10446->446/tcp
i0kig1efb9e2 testvol3_service replicated 1/1 lipingxue/samba-debian:latest *:10447->447/tcp
szpco6zxlct9 testvol4_service replicated 1/1 lipingxue/samba:latest *:10448->448/tcp
However, for alpine based image, it is in "starting state" for 1 minutes and then go to "running state".
Filed an issue https://github.com/dperson/samba/issues/86 in dperson/samba to track this.
Filed an issue to request a build based on debian image. https://github.com/dperson/samba/issues/89
@lipingxue looks like dperson/samba#89 is closed. What are the next steps for us?
Hello!
I've been performing tests with the vfile plugin and found out that the Alpine-based dperson/samba image suffers a severe performance hit when running as a service with public ports forwarded to it. I'll detail my test scenario and results below, but as a quick TLDR, using an older dperson/samba image, based on debian, gets us rid of all issues. In my case I built an image from commit https://github.com/dperson/samba/commit/c72d17eee35f0016c495d316f13ad3db7848837b and pushed it to a private repository as dperson/samba:latest, forcing my servers to use that instead of the Alpine-based image. Later commits should work as well as long as they're from before the change to Alpine.
For my test case: on a swarm-enabled server with both vfile and vsphere plugins installed and configured I created a new volume and tried to run a container mounting it, which failed:
[root@lab-dock05 admin]# docker volume create --driver vfile lab-dock-timeout -o size=1gb
lab-dock-timeout
[root@lab-dock05 admin]# docker run -ti -v lab-dock-timeout:/mnt ubuntu bash
docker: Error response from daemon: VolumeDriver.Mount: Failed to blocking wait for Mounted state. Error: Timeout reached; BlockingWait is not complete.
See 'docker run --help'.
[root@lab-dock05 admin]# docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
plk74i10fm8e vFileServerlab-dock-timeout replicated 0/1 dperson/samba *:0->445/tcp
As you can see there are no healthy replicas of the Samba service. Please note that setting VFILE_TIMEOUT_IN_SECOND on the plugin didn't solve the problem, as all it seems to do is to wait longer for the Samba service to start.
If you take a look at https://github.com/dperson/samba/blob/master/Dockerfile the healthcheck is a smbclient command with a timeout of 15 seconds, which isn't nearly enough to complete on the alpine-based image:
[root@lab-dock05 admin]# docker service ps vFileServerlab-dock-timeout
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
pev3c03ra6tg vFileServerlab-dock-timeout.1 dperson/samba lab-dock05.domain.org.br Running Starting 10 seconds ago
[root@lab-dock05 admin]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
155755f0145f dperson/samba:latest "samba.sh -s share..." 20 seconds ago Up 13 seconds (health: starting) 139/tcp, 137-138/udp, 445/tcp vFileServerlab-dock-timeout.1.pev3c03ra6tge8ltvv1ssj8n4
[root@lab-dock05 admin]# docker exec -ti 155755f0145f bash
bash-4.3# time smbclient -L '\\localhost\' -U 'guest%' -m SMB3
Sharename Type Comment
--------- ---- -------
share1 Disk
IPC$ IPC IPC Service (Samba Server)
Server Comment
--------- -------
Workgroup Master
--------- -------
real 1m0.421s
user 0m0.023s
sys 0m0.007s
Since it took 1 minute for the smbclient to run (!) the healthcheck will always fail and the service will never become healthy. Manually increasing the healthcheck timeout to something like 120s works too, but I don't believe that waiting 1 minute for a connection to get established is acceptable.
After building the image from https://github.com/dperson/samba/commit/c72d17eee35f0016c495d316f13ad3db7848837b and launching it as a service using the same parameters as vfile does, I got a very different result:
docker service create --health-timeout 90s --name samba --mount type=bind,source=/tmp,target=/mount -p445:445 -d docker-registry.domain.org.br/dperson/samba -s "share1;/mount;yes;no;no;all;root;root" -u "root;badpass"
[root@lab-dock05 admin]# docker service ps samba
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
hirg6c8k4ac1 samba.1 docker-registry.domain.org.br/dperson/samba:latest lab-dock05.domain.org.br Running Running 28 seconds ago
[root@lab-dock05 admin]# docker exec -ti b0f535d6b833 bash
root@b0f535d6b833:/# apt-get update && apt-get install -y smbclient
(ommited for brevity)
root@b0f535d6b833:/# time smbclient -L '\\localhost\' -U 'guest%' -m SMB3
WARNING: The "syslog" option is deprecated
Sharename Type Comment
--------- ---- -------
share1 Disk
IPC$ IPC IPC Service (Samba 4.5.12-Debian)
Server Comment
--------- -------
Workgroup Master
--------- -------
real 0m0.204s
user 0m0.035s
sys 0m0.010s
As you can see the healthcheck itself isn't the problem, as the service becomes healthy despite the healthcheck. The problem is that the new Alpine-based image seems to suffer from performance problems when running on swarm (I didn't really dig deep enough as to why this is happening, but it isn't really my focus right now).
IMHO it might be beneficial to fork dperson/samba and maintain a separate, debian-based Samba image until the current problems are sorted.
@diegolima Thank you for your detailed evaluation! Yes we also observed that the health check in some case just cannot pass and will make the SMB server never up running. Thank you for locating the reason! We are currently looking for solutions such as replace with other SMB server image or forking a separate image as you suggested. will have an update soon!
Has been resolved with https://github.com/vmware/vsphere-storage-for-docker/pull/2001
After file server service is created, the time for the service state from Starting to Running is approximate 1 minute. This was not the case in previous testing. Some new changes (might be on the file server side) has introduced this long startup time. Need to root the cause and find a solution if it's possible.