Closed tafkam closed 2 years ago
Hey thanks tafkam,
If your are talking about the config variables of replication-manager i need to dig but we have already a password encryption feature, this key can be extract from secret map of ks8 and put into the repman pod of docker, podman to enable password decryption. In opensvc we can store a map value to file via shm i guess this possible in K8S? if not i need to fetch about
./replication-manager-pro --config=etc/local/features/backup/config.toml.sample monitor --help | wc -l 382 ENV variables :(
If you are referencing the aes encryption described in https://docs.signal18.io/configuration/security , this won't help with kubernetes integration. Kubernetes secrets are base64 encoded, and can be injected into the runtime environment of the docker image, or mounted as a file into the docker image, both as plaintext. For replication-manager to use existing kubernetes mariadb password secrets they need to be read from the enviroment (or a defined file location where the secret is mounted). Using the environment is preferred since the secret files generated by different helm charts, operators, boilerplate manifests etc. differ in key:value formats. 'user: password' is seldomly used here.
The amount of options for replication-manager has nothing to do with replacing defined environment variables in the configuration file with actual values from the environment. if you see {env.root_password} in the config file it just needs to be replaced with the value returned by the function getEnv('root_password')
Yes i undertsand your point , i'll keep i mind for feature request
Can we plan a conf call next week on skype (svaroqui) or zoom to better define what can be done here ?
If i get you you correctly: Feature 1: repman on startup replace default config variables on every password related options with equi env variables define in the container
Feature 2: repman provision a k8s secret map to all services password variables and every services provisioned later on refer to the key map instead of a plain password
I don't know for what use case feature 2 would be good for. I didn't ask for something like that ;-)
What I would like to have is Feature1 for any configuration option, not just passwords. For example you could set FAILOVER_MODE="automatic" as a environment variable and use it in failover-mode={env.FAILOVER_MODE} . Or you set REPLICATION_USER and REPLICATION_PASSWORD as environment variables and can do something like replication-credential = "{env.REPLICATION_USER}:{env.REPLICATION_PASSWORD}"
Anyways I bumped into another issue which is kind of a dealbreaker for using replication-manager properly on kubernetes. It seems replication-manager is caching the dns lookup for a server indefinitely and stops looking up IPs if it encounters a resolv error. When using a MariaDB statefulset with headless service and a db instance crashes/is restarted/updated the new pod will also have a new IP address. replication-manager will never see the up and running db instance, since it's not doing any more dns lookups.
There is probably a few more issues like those in the kubernetes context, which would be a bunch of work to figure out in every possible supported replication-manager configuration. Maybe the replication-manager core developers should first figure out if they want to go down the kubernetes route. Looking on the whole lot of features replication-manager provides in classical server world, there would be lots of changes and additions to provide the same functionality in kubernetes. I guess at that point a fork and rewrite would be easier, than having a "can do everything everywhere" code-base ;-) If you want to open the can of worms that kubernetes is, here is a good starting point https://github.com/operator-framework/operator-sdk
Le 17 avr. 2020 à 22:43, tafkam notifications@github.com a écrit :
I don't know for what use case feature 2 would be good for. I didn't ask for something like that ;-)
What I would like to have is Feature1 for any configuration option, not just passwords. For example you could set FAILOVER_MODE="automatic" as a environment variable and use it in failover-mode={env.FAILOVER_MODE} . Or you set REPLICATION_USER and REPLICATION_PASSWORD as environment variables and can do something like replication-credential = "{env.REPLICATION_USER}:{env.REPLICATION_PASSWORD} »
OK i can do this , i can also additionaly do it per cluster CLUSTERNAME_FAILOVER_MODE Anyways I bumped into another issue which is kind of a dealbreaker for using replication-manager properly on kubernetes. It seems replication-manager is caching the dns lookup for a server indefinitely and stops looking up IPs if it encounters a resolv error. When using a MariaDB statefulset with headless service and a db instance crashes/is restarted/updated the new pod will also have a new IP address. replication-manager will never see the up and running db instance, since it's not doing any more dns lookups.
Where did you catch this , using it for more then 4 years on opensvc , never get similar issues using CNI networks provided by the orchestrator that also recycle IPs when re restarting any services
There is probably a few more issues like those in the kubernetes context, which would be a bunch of work to figure out in every possible supported replication-manager configuration. Maybe you should first figure out internally between the replication-manager core developer if you even want to go that route.— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/298#issuecomment-615455004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWVCIG5D52NFHM6AHMK7C3RNC5P3ANCNFSM4MK4U5KQ.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
Using db-servers-hosts = "sts-mariadb-0.sts-mariadb:3306,sts-mariadb-1.sts-mariadb:3306" as cluster configuration. When I start both statefulset pods and then deploy replication-manager it will see both instances as running (no replication started yet, "out-of-the-box" mariadb docker images with a server configuration for master/slave replication)
Then i started replication-manager when sts-mariadb-1 instance wasn't up yet, getting the error
2020/04/17 18:41:51 | STATE | RESOLV ERR00062 : DNS resolution for host sts-mariadb-1.sts-mariadb error lookup sts-mariadb-1.sts-mariadb on 169.254.20.10:53: server misbehaving
2020/04/17 18:41:58 | INFO | Declaring slave db sts-mariadb-1.sts-mariadb:3306 as failed
2020/04/17 18:41:58 | ALERT | Server sts-mariadb-1.sts-mariadb:3306 state changed from Suspect to Failed
since the pod ip can't be resolved with the cluster dns yet. When sts-mariadb-1 instance finished loading, and the domain is resolving, replication-manager instance status isn't changing.
sts-mariadb-0 is up and running at that time. so i'm killing that pod for tests:
2020/04/17 18:42:24 | INFO | Declaring slave db sts-mariadb-0.sts-mariadb:3306 as failed
2020/04/17 18:42:24 | ALERT | Server sts-mariadb-0.sts-mariadb:3306 state changed from Suspect to Failed
sts-mariadb-0 is getting a new pod IP and the headless service domain sts-mariadb-0.sts-mariadb changed accordingly.
Both server pods were restarted and are running in kubernetes and can be connected to with the headless service domain. Both are still marked as failed in replication-manager. The dns resolv error for the sts-mariadb-1 is just an indication for me that the dns lookup isn't repeated. Maybe the dns resolver is working differently for the OpenSVC functionality?
What release are you using 2.1 docker image ?
Anyway should not use the docker container hostname , that will never work like this you should use the DNS of the orchestrator that is 99% sure a K8S config issue regarding DNS that do not propagate DNS change to containers hostnames your service name should looks like my-svc.my-namespace.svc.cluster.local
when repman deployed database and proxy you have the possibility to set the cluster part but the namespace is harcoded to the cluster name
I'm using the latest replication-manager (2.1) release from dockerhub. There is no container "hostname" in Kubernetes... Im using the headless service DNS which expands to sts-mariadb-0.sts-mariadb.namespace.svc.cluster.local.
Also I can assure you there is no dns issue in my kubernetes cluster. I'm running dozens of different services all depending on working dns.
humm intresting i'll make my study to invetigate that but apparently quick googling for dns cache resolv on go-mysql driver does not pop any issues or configuration , i would agree with you if the code of replication-manager itself do revers DNS and store the result in a local variable , indeed we stop doing this a long time ago .
Can you try the docker pro release, the one i use for testing orchetratiors, i think there are different net implementation at compile time
Beside some errors about OpenSVC the pro release is resolving new pod IPs and the instance leaves the failed state. I have ran into another issue not related to replication-manager, so I can't test the replication bootstrap and failover functionalities, yet. I will get back to you, when I can test more of replication-manager. I'm changing the issue title since this is getting a bit out of scope here ;-)
Please feel free to ping us on any feature update or request, may be get in contact for a talk to explain how we are moving with the product an why !
Majority of our sponsors use replication-manager on premise for backups, monitoring and HA but other are using mostly the API for bootstraping replication, switchover or trigger rolling restart on config changes ( slapos orchestrator) , others just use it fully integrated with opensvc for cluster deployment. are you already using init container for database and proxy container?
Basically I would just like to setup several Mariadb Master/Slave replication clusters on kubernetes with Maxscale (or whatever solution) for read/write split and master auto failover and rejoins. For Maxscale to work the failover magic I need to have gtid replication, which none of the existing kubernetes operators or docker images I've found is supporting, and I don't want to initially setup gtid replication manually for new Mariadb clusters. I would like to not use Galera since I've had bad experiences with cluster wide locking, and recovery is exhausting, more so in the containerized world. And then I found replication-manager ;-) So I'm mainly interested in (semi-)automatic replication bootstrapping, failure detection, failover and cluster recovery functionalities. Rolling restarts and updates are handled by kubernetes quite well, and I already have a backup solution(appscode stash). Monitoring is done with a prometheus exporter sidecar. Since replication-manager's autofailover and recovery features could replace Maxscale in that regard, I plan to evaluate proxysql later if everything should work on the replication side. That said I try to use most of the standard functionalities of the available docker containers and try to not build my own docker image. Bitnamis Mariadb init scripts were interfering with external replication changes, so I'm back to the official Mariadb docker image for which I now have to find an elegant solution to change the server-id for each statefulset instance in the my.cnf config file since I dont want to create several deployments and configmaps for master/slaves in kubernetes.
Hi,
the Pro version uses the DNS resolver from operating system (libc binding on Linux) the Std uses a pure Go version
if you find that you have resolve issues with the Std version, try this before running it:
export GODEBUG=netdns=cgo
It will force it to run with the C binding. I'm interested to know if that solves any issues, in which case we can change our compile-time settings.
Le 18 avr. 2020 à 23:32, tafkam notifications@github.com a écrit :
Basically I would just like to setup a Mariadb Master/Slave replication cluster on kubernetes with Maxscale (or whatever solution) for read/write split and master auto failover and rejoins. For Maxscale to work the failover magic I need to have gtid replication, which none of the existing kubernetes operators or docker images I've found is supporting, and I don't want to initially setup gtid replication manually for new Mariadb clusters. I would like to not use Galera since I've had bad experiences with cluster wide locking, and recovery is exhausting, more so in the containerized world.
replication-manager is helping as it drive failover to multiple routes and protect database with FTWRL and check long running trx, put contraints like possible slaves and delay etc … It can be one or many maxscale, proxysql, VIP because as proxysql is excellent it does not support LOAD DATA for example and multi solution may be needed ` And then I found replication-manager ;-) So I'm mainly interested in (semi-)automatic replication bootstrapping, failure detection, failover and cluster recovery functionalities.
In a perfect world to bootstrap replication using JWT api you can:
TOKEN=$(curl -s -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' --data '{"username »:"admin","password »:"repman"}' https://repman..sts-mariadb:10005/api/login | jq -r '.id_token’)
than
curl -H 'Accept: application/json' -H "Authorization: Bearer ${TOKEN}" https://repman..sts-mariadb:10005/api/clusters/cluster1/actions/replication/bootstrap/master-slave
But world is not perfect and you could found those api calls useful
Wait for all db services can be connected https://repman..sts-mariadb:10005/api/clusters/cluster1/actions/waitdatabases
Reset replication infos and binlogs https://repman..sts-mariadb:10005/api/clusters/cluster1/actions/replication/cleanup
For service configuration replication offer downloadable tar gz config:
prov-db-tags = "smallredolog,semisync,innodb,noquerycache,threadpool,logslow,docker" prov-db-memory = "256" prov-db-memory-shared-pct = "threads:16,innodb:60,myisam:10,aria:10,rocksdb:1,tokudb:1,s3:1,archive:1,querycache:0" prov-db-disk-size = « 1" prov-db-cpu-cores = "1" prov-db-disk-iops = « 300 »
You can later on change those parameters via http client of API
For persitent storage over replication-manager restart
monitoring-save-config = true
config.json config file store under /var/lib/replication-manager/cluster1/
Also API can add meta configuration clusters based on service plans
Once done , the trick is to add a busybox init container to your database services that would simply extract the config from calling insecure http api
Command: "sh -c 'wget -qO- http:/repman..sts-mariadb:10001/api/clusters/cluster1/serverssts-mariadb-0.sts-mariadb/3306/config|tar xzvf - -C /data’
Use this volume for mapping mariadb container /etc/mysql and /var/lib/mysql and /init where some scripts for backups and sending back information to reman can be found and should schedule by KS8 task
And the container will get not only correct server_id but clean setup where all informations like binlogs, logs, redo logs will be split into subdirectory in /var/lib/mysql/.system
It looks like this in opensvc and need to be translated in ks8
[container#0001] detach = false type = docker image = busybox netns = container#0001 rm = true optional = true start_timeout = 30s volume_mounts = /etc/localtime:/etc/localtime:ro {env.base_dir}/pod01:/data command = sh -c 'wget -qO- http://{env.mrm_api_addr}/api/clusters/{env.mrm_cluster_name}/servers/{env.ip_pod01}/{env.port_pod01}/config|tar xzvf - -C /data'
[container#0002] tags = pod01 type = docker rm = true netns = container#0001 run_image = {env.db_img} run_args = -e MYSQL_ROOT_PASSWORD={env.mysql_root_password} -e MYSQL_INITDB_SKIP_TZINFO=yes -v /etc/localtime:/etc/localtime:ro -v {env.base_dir}/pod01/data:/var/lib/mysql:rw -v {env.base_dir}/pod01/etc/mysql:/etc/mysql:rw -v {env.base_dir}/pod01/init:/docker-entrypoint-initdb.d:rw
Voila, hope it helps
Rolling restarts and updates are handled by kubernetes quite well, and I already have a backup solution(appscode stash).
Backup is fully part of old master recovery using assync and semisync replication, the old master can be ahead of the new elected one.
This is because the binlog have to move before the event replication network part , always possible to have fantom write the application never saw but committed to binlog after a crash.
So replication-manager track this and have multiple way for fixing the old master, putting it back in time prior of election time , restoring a backup is the ultimate way to repair.
Replication-manager can help scheduling backup and delta archiving via streaming backup local directory or S3
in conjonction with restic and minio in my case .
Monitoring is done with a prometheus exporter sidecar.
Repman expose via API the prometheus metrics as well
And can also store those metics in carbon to serve via API to graphana , i may use that storage to compute some performance alerting
Since replication-manager's autofailover and recovery features could replace Maxscale in that regard, I plan to evaluate proxysql later if everything should work on the replication side. That said I try to use most of the standard functionalities of the available docker containers and try to not build my own docker image.
We are also using regular editor images excepting for mysql and percona just for adding socat and xtrabackup in our clones
Bitnamis Mariadb init scripts were interfering with external replication changes, so I'm back to the official Mariadb docker image for which I now have to find an elegant solution to change the server-id for each statefulset instance in the my.cnf config file since I dont want to create several deployments and configmaps for master/slaves in kubernetes.
Now all this good but such API manual deployment is limited , the quest i started is to directly drive K8S cluster deployment i have started doing with orchestrator="kube"
Where we are very confident with opensvc getting already direct clients, users and feedback from developpers we are less with K8S and looking for contributors or sponsors in this area .
OVH/Microsoft please help if you read that thread:)
Thanks
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/298#issuecomment-615953194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWVCIEWISUJWBFENKMTIO3RNIL6ZANCNFSM4MK4U5KQ.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
Hi,
the Pro version uses the DNS resolver from operating system (libc binding on Linux) the Std uses a pure Go version
if you find that you have resolve issues with the Std version, try this before running it:
export GODEBUG=netdns=cgo
It will force it to run with the C binding. I'm interested to know if that solves any issues, in which case we can change our compile-time settings.
Sadly I can't reproduce the error situation anymore. The standard replication-manager and -pro replication manager are now behaving the same way in that regard. Have you changed your compile-time settings already for signal18/replication-manager:2.1? I'm always pulling the latest image for tests.
After a bit more fiddling the managed replication works now! There are some irritating logs though:
time="2020-04-21T20:17:41Z" level=info msg="Enforce GTID replication on slave sts-mariadb-1.sts-mariadb:3306" cluster=sts
time="2020-04-21T20:17:44Z" level=warning msg="Cluster state down" cluster=sts code=ERR00021 status=RESOLV type=state
time="2020-04-21T20:17:44Z" level=warning msg="Could not find a slave in topology" cluster=sts code=ERR00010 status=RESOLV type=state
time="2020-04-21T20:17:44Z" level=warning msg="Could not find a master in topology" cluster=sts code=ERR00012 status=RESOLV type=state
time="2020-04-21T20:17:44Z" level=warning msg="Monitor freeze while running critical section" cluster=sts code=ERR00001 status=RESOLV type=state
Still working on the maxscale/proxysql part to get the full cluster working, but the replication-manager part works so far. Thanks for all the help and hints!
Here are some general suggestions/feature requests which I would've found useful on my journey ;-):
more advanced features which would be possible for kubernetes:
Thanks again for the kind help!
I pushed Env variables setting
Can you try it if you get time ?
Sure, what is the expected variable format? $VAR?
yes upper case equivalent config variable with s/-/_/g
Tried db-servers-hosts = "$SERVER1,$SERVER2" first, but replication manager used the literal string value. Then looking at your commit changes, the environment loading seems only to work in the default section (SetEnvPrefix("DEFAULT"))? Setting the environment variable FAILOVER=automatic and failover-mode = "$FAILOVER" in the config did indeed work.
Yes only literal i can may be do something for db-servers-hosts, i already have a wrapper for IPV6 on this variable ?
Wouldn't it be easier and more flexible if you replace all ${arbitrary_env_var} occurences in the config file with os.Getenv(arbitrary_env_var) on load like in https://github.com/signal18/replication-manager/blob/2.1/utils/misc/env_vars.go ?
Yes possibly i'm gone do this for templating
I'm testing a Master-Master Replication setup. When both Mariadb instances are started, replication-manager acknowledges the multi-master setup of the cluster. But there is no gtid replication set up yet and both DBs are stand-alone. In the Dashboard under Cluster/Replication Bootstrap is no Multi Master replication option. To get the gtid replication running I have to bootstrap Master-Slave Positional. The cluster is of the type master-slave then, and the slave has gtid replication configured. After doing a switchover, the former master also gets gtid replication configured. After that I'm doing a Multi Master bootstrap but nothing happens and the cluster stays in master-slave mode. When I restart the replication-manager pod after all that, the multi-master setup is recognized by replication-manager, and the multi-master gtid replication is working properly
How should multi-master setups be bootstrapped properly? This seems unintuitive/buggy to me.
Le 24 avr. 2020 à 23:40, tafkam notifications@github.com a écrit :
I'm testing a Master-Master Replication setup. When both Mariadb instances are started, replication-manager acknowledges the multi-master setup of the cluster. But there is no gtid replication set up yet and both DBs are stand-alone. In the Dashboard under Cluster/Replication Bootstrap is no Multi Master replication option. To get the gtid replication running I have to bootstrap Master-Slave Positional. The cluster is of the type master-slave then, and the slave has gtid replication configured. After doing a switchover, the former master also gets gtid replication configured. After that I'm doing a Multi Master bootstrap but nothing happens and the cluster stays in master-slave mode. When I restart the replication-manager pod after all that, the multi-master setup is recognized by replication-manager, and the multi-master gtid replication is working properly
How should multi-master setups be bootstrapped properly? This seems unintuitive/buggy to me.
Here are the ending URL of the API to boostrap replication , i agree that the GUI can expose more boostraping options .
case "master-slave":
mycluster.SetMultiTierSlave(false)
mycluster.SetForceSlaveNoGtid(false)
mycluster.SetMultiMaster(false)
mycluster.SetBinlogServer(false)
mycluster.SetMultiMasterWsrep(false)
case "master-slave-no-gtid":
mycluster.SetMultiTierSlave(false)
mycluster.SetForceSlaveNoGtid(true)
mycluster.SetMultiMaster(false)
mycluster.SetBinlogServer(false)
mycluster.SetMultiMasterWsrep(false)
case "multi-master":
mycluster.SetMultiTierSlave(false)
mycluster.SetForceSlaveNoGtid(false)
mycluster.SetMultiMaster(true)
mycluster.SetBinlogServer(false)
mycluster.SetMultiMasterWsrep(false)
case "multi-tier-slave":
mycluster.SetMultiTierSlave(true)
mycluster.SetForceSlaveNoGtid(false)
mycluster.SetMultiMaster(false)
mycluster.SetBinlogServer(false)
mycluster.SetMultiMasterWsrep(false)
case "maxscale-binlog":
mycluster.SetMultiTierSlave(false)
mycluster.SetForceSlaveNoGtid(false)
mycluster.SetMultiMaster(false)
mycluster.SetBinlogServer(true)
mycluster.SetMultiMasterWsrep(false)
case "multi-master-ring":
mycluster.SetMultiTierSlave(false)
mycluster.SetForceSlaveNoGtid(false)
mycluster.SetMultiMaster(false)
mycluster.SetBinlogServer(false)
mycluster.SetMultiMasterRing(true)
mycluster.SetMultiMasterWsrep(false)
case "multi-master-wsrep":
mycluster.SetMultiTierSlave(false)
mycluster.SetForceSlaveNoGtid(false)
mycluster.SetMultiMaster(false)
mycluster.SetBinlogServer(false)
mycluster.SetMultiMasterRing(false)
mycluster.SetMultiMasterWsrep(true)
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/298#issuecomment-619250437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWVCIBYEPCPLSYXPCYTW3TROIBNRANCNFSM4MK4U5KQ.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
The gui is exposing the option to bootstrap multi-master, but not when the cluster is detected as multi-master ;-) so that's probably the bug
Le 25 avr. 2020 à 11:53, tafkam notifications@github.com a écrit :
The gui is exposing the option to bootstrap multi-master, but not when the cluster is detected as multi-master ;-) so that's probably the bug
Thanks for reporting was wrong spelling i patch it, Need to clear Browser cache to get it work as that’s in HTML code
Rds /svar
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/298#issuecomment-619353330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWVCIAKKILBDS435N4G2JDROKXI3ANCNFSM4MK4U5KQ.
Stéphane Varoqui, VP of Products Phone: +33 695-926-401, skype: svaroqui https://signal18.io/ https://signal18.io/
Is it possible to use environment variables in the replication-manager main configuration? I have found references for such a feature for the provisioning agent like {env.nodes} in the documentation. Using environment variables would help with using replication-manager in kubernetes by referencing mariadb passwords from managed databases with "env.valueFrom.secretKeyRef".
On a side note, since replication-manager has already lots of the needed functionality for kubernetes operators(like provisioning, managing, monitoring, failover, using golang, etc.), and waaayy more features and options for high-availability databases than the mariadb/mysql operators I've tried, you should consider going forward with a kubernetes operator edition for replication-manager.