toughIQ / docker-mariadb-cluster

Dockerized MariaDB Galera Cluster
GNU General Public License v2.0
97 stars 69 forks source link

Persistence #2

Open BarnumD opened 7 years ago

BarnumD commented 7 years ago

I realize the description on the readme states that this is supposed to work with no persistent data volumes, but I really like what you've done with this - I just wanted the extra peace of mind of having data persistence. I'd like to see it added as an option.

I made a version that, sensing a volume at /data, manages the mysql datadir in that location. Except, each container gets it's own subfolder. That way you can have multiple containers on each host. Or all the containers can share a network (nfs) location safely. There is also a cleanup function as containers come and go. I'll put in a pull request.

Franselbaer commented 7 years ago

This is not really needed. As long as one container exists the data is hold safely... Just use a xtrabackup job to save data externally.

Or just install a swarm with 3 instances on aws, 3 instances on azure and 3 instances inhouse and let them build a galera of nine. =) They cannot die all at one.

dottgonzo commented 7 years ago

in my scenario, i can't use external resource, and i have to prevent that everything stop to work if there is a fault on all nodes.

For example if all nodes will dies, and i have a backup of the data, the infrastructure stay down until i getting back to work 1 cluster, waiting it to bring up, then scaling it and restore the data

For this reason, in other services, i've used glusterfs that sync on every local node of the cluster, the volume informations. Can i do something similar?

If is impossible to use external folders to save persistent data, maybe could be useful if the boot process allow to restore the db from a backup (if it exists). This maybe not solve the problem that the cluster not scale automatically at boot, but it could save my life

dottgonzo commented 7 years ago

after some days in a swarm environment, i can say that the choice to not use persistent volumes is right. Trying to syncronize persistent volumes with gluster cause slow performances, and have many replicas is just enought

monotykamary commented 7 years ago

The pull request is very clever on how it organizes by IP. However, it did give me issues when booting the cluster above 1 replica along with quite a few performance problems; but it could be that I'm using docker 17.07.0-ce.

Since the only problem with persistence for this image is that each container has different data in their respective /var/lib/mysql beyond the bootstrapped replica, you can simply create volumes based on their .Task.Name placeholder:

$ docker service create --name mariadb-cluster \
    --network default \
    --replicas=1 \
    --mount type=volume,src="{{.Task.Name}}",dst=/var/lib/mysql \
    --env DB_SERVICE_NAME=mariadb-cluster \
    toughiq/mariadb-cluster:10.2

$ docker service scale mariadb-cluster=5
$ docker volume ls

DRIVER              VOLUME NAME
local               mariadb-cluster.1.oinldzhhlr64bah5pkm6vbpgl
local               mariadb-cluster.2.7d2b0nwjcj34h28rkw39763la
local               mariadb-cluster.3.o3lgpvozvu72zwvm764ink18c
local               mariadb-cluster.4.ka37en4xle69fter32qq8y54c
local               mariadb-cluster.5.gcni8msw43i9phvv09yo6ybpd

You can then use any docker volume plugin to move and backup your persistent data, like convoy or nimble.

toughIQ commented 7 years ago

If you just want to have a place, where the data is stored for backup reasons, this is a way to go. But if one tasks dies, it is restarted by swarm using a different task id. so it would get a new volume and you cannot benefit from previously stored data.

monotykamary commented 7 years ago

Very fair point. What about .Task.Slot? :

$ docker service create --name mariadb-cluster \
    --network default \
    --replicas=1 \
    --mount type=volume,src=mariadb-cluster."{{.Task.Slot}}",dst=/var/lib/mysql \
    --env DB_SERVICE_NAME=mariadb-cluster \
    toughiq/mariadb-cluster:10.2

$ docker service scale mariadb-cluster=5
$ docker volume ls

DRIVER              VOLUME NAME
local               mariadb-cluster.1
local               mariadb-cluster.2
local               mariadb-cluster.3
local               mariadb-cluster.4
local               mariadb-cluster.5

EDIT: It looks like it doesn't recover well from a docker kill.

EDIT2: Sometimes it does, other times it requires manual intervention to remove files like tc.log to avoid errors:

docker run --rm -it -v mariadb-cluster.3:/mariadb busybox rm /mariadb/tc.log
toughIQ commented 7 years ago

I am not quite sure, if this isnt a default behavior. Means, that if you would hard kill a classic Cluster Node, it would result in the same error/problem when starting up. Another point which I am not sure about is IP address assignment by Swarm in case of restarting a dead task. Does it get the same address assigned? This is crucial, since the IP addresses of the cluster members are written to the config file, which gets persisted as well in the volume.

/etc/mysql/conf.d/galera.cnf:
wsrep-cluster-address = gcomm://10.0.0.3,10.0.0.4,10.0.0.5,?pc.wait_prim=no
monotykamary commented 7 years ago

I am not quite sure, if this isnt a default behavior. Means, that if you would hard kill a classic Cluster Node, it would result in the same error/problem when starting up.

I haven't reached other bugs in reviving the Cluster Nodes, but I have seen the tc.log init issue with cluster-less setups on bare-metal and container installations of MariaDB. Deleting it usually solves the issue, but it isn't really nice.

2017-09-07 11:00:22 139794375907200 [ERROR] Recovery failed! You must enable all engines that were enabled at the moment of the crash
2017-09-07 11:00:22 139794375907200 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete tc log and start mysqld with --tc-heuristic-recover={commit|rollback}
2017-09-07 11:00:22 139794375907200 [ERROR] Can't init tc log
2017-09-07 11:00:22 139794375907200 [ERROR] Aborting

Does it get the same address assigned?

I believe so. I've been killing Nodes 2 and 4 quite a few times with them having addresses 10.0.1.4 and 10.0.1.16 respectively and they revive with the same addresses. There is no complete guarantee it does this every single time though.

The logs from docker service logs mariadb-cluster just shows InnoDB and mysqld initializing, with nothing really interesting or relevant there:

mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Uses event mutexes
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Compressed tables use zlib 1.2.8
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Using Linux native AIO
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Number of pools: 1
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Using SSE2 crc32 instructions
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Initializing buffer pool, total size = 256M, instances = 1, chunk size = 128M
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Completed initialization of buffer pool
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948054353664 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Highest supported file format is Barracuda.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:21 139948792653696 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1620586
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139948792653696 [Note] InnoDB: 128 out of 128 rollback segments are active.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139948792653696 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139948792653696 [Note] InnoDB: Creating shared tablespace for temporary tables
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139948792653696 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139948792653696 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139948792653696 [Note] InnoDB: Waiting for purge to start
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139948792653696 [Note] InnoDB: 5.7.19 started; log sequence number 1620595
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:23 139947608631040 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139947608631040 [Note] InnoDB: Buffer pool(s) load completed at 170907 11:10:24
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] Plugin 'FEEDBACK' is disabled.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] Recovering after a crash using tc.log
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] Starting crash recovery...
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] Crash recovery finished.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] Server socket created on IP: '::'.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Warning] 'proxies_priv' entry '@% root@e4f35d83cd77' ignored in --skip-name-resolve mode.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] Reading of all Master_info entries succeded
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] Added new Master_info '' to hash table
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | 2017-09-07 11:10:24 139948792653696 [Note] mysqld: ready for connections.
mariadb-cluster.2.xd3l7ekcah2u@Sherry    | Version: '10.2.8-MariaDB-10.2.8+maria~jessie'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

However, there was one irregular case where a wsrep entry for address 10.0.1.16 was deleted, but everything was fine (with no manual intervention) soon after:

mariadb-cluster.1.kx3b38npw11u@Sherry    | 2017-09-07 10:59:53 140510299666176 [Note] WSREP: save pc into disk
mariadb-cluster.1.kx3b38npw11u@Sherry    | 2017-09-07 10:59:53 140510299666176 [Note] WSREP: forgetting f99a02e9 (tcp://10.0.1.16:4567)
mariadb-cluster.1.kx3b38npw11u@Sherry    | 2017-09-07 10:59:53 140510291273472 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
mariadb-cluster.1.kx3b38npw11u@Sherry    | 2017-09-07 10:59:53 140510299666176 [Note] WSREP: deleting entry tcp://10.0.1.16:4567
mariadb-cluster.1.kx3b38npw11u@Sherry    | 2017-09-07 10:59:53 140510299666176 [Note] WSREP: (d4eb66b1, 'tcp://0.0.0.0:4567') turning message relay requesting off

Killing Node 1 (10.0.1.15) gave issues in Node 5 (10.0.1.18). Node 1 did revive successfully (after occasionally removing tc.log) but gave Node 5 an error reconnecting:

mariadb-cluster.5.8tml35a9dy8d@Sherry    | 2017-09-07 11:33:51 140354355451648 [Note] WSREP: (f91a72f2, 'tcp://0.0.0.0:4567') reconnecting to d4eb66b1 (tcp://10.0.1.15:4567), attempt 30
mariadb-cluster.5.8tml35a9dy8d@Sherry    | 2017-09-07 11:33:54 140354355451648 [Note] WSREP: (f91a72f2, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://10.0.1.15:4567 timed out, no messages seen in PT3S

Killing Node 5 solved this issue. Like above, killing either or both nodes didn't change their addresses.

toughIQ commented 7 years ago

I got the stuff with the configs wrong in my last posting. In a non-persistent setup you dont have any problems, since /etc/mysql and /var/lib/mysql are within the container and only exist during container runtime. The MariaDB image is designed that it will initiate the database if the datadir (typically /var/lib/mysql) is NOT present. During this initrun the setup scripts for the cluster get invoked. In the current persistence setup, only the datadir gets stored on volumes. The config does not. So such a restarted container should have the data, but no special config in /etc/mysql/conf.d/galera.cnf, since it does not get written because of already existing datadir.

eleaner commented 5 years ago

@Franselbaer

I know it was a while ago but how would I execute xtrabackup job?

I understand the whole point is that the database folder is not exposed to the host, so it looks like running xtrabackup in another container is not an option. The mariadb image does contain xtrabackup but I would have to schedule an docker exec to an ever-changing name of the container. Any ideas? Thanks Marcin