Open sklasing opened 5 years ago
I saw in this video by Stephane https://docs.signal18.io/architecture/topologies/sharding the ability to see the proxy nodes via the web browser which we have not configured yet since the POC is based in the clouds; will resolve that asap. Hoping that will provide proxy node failover capabilities.
The demo btw was excellent insight as to how powerful the rep-man is when integrated with shardproxy. Will be chasing down the code examples, in particular the resharding.
The ability to rebalance data after ading additional shards is a high priority for proving spider is industrial grade so all assistance is welcome.
Finally was thrilled to hear Stephanes reasonng for why Spider vs other shard technologies, since I have been using a very similar argument to management as to why Spider was shard designed correctly, in summary the sharding is below the partition handlers optimiser.
Would still like the shard proxy failover capabilities via the cli console. I am assuming since its a cli that same migh be command line driven for scripting a fail-over.
Noting both the browser and terminal consoles, neither support failover switchover control of the spider proxy nodes, which are MariaDB nodes. Would be ideal to support same from one replication-manager config.toml since fail-over of the proxy affects the shards and most likely requires coordinated tweaking.
Open to ideas as to what other config can be placed in the same config to get rep-man to allow proxy-nodes to show as another cluster.
Hi Sak, yes they are internally monitored like MariaDb nodes, why would you failover the spider nodes, each table is pointing to each shard master via the server system table if one master failover repman change the server Uri and reload system table. Does it help?
Le mar. 18 déc. 2018 à 19:28, SAK notifications@github.com a écrit :
Noting both the browser and terminal consoles, neither support failover switchover control of the spider proxy nodes, which are MariaDB nodes. Would be ideal to support same from one replication-manager config.toml since fail-over of the proxy affects the shards and most likely requires coordinated tweaking.
Open to ideas as to what other config can be placed in the same config to get rep-man to allow proxy-nodes to show as another cluster.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/signal18/replication-manager/issues/263#issuecomment-448321349, or mute the thread https://github.com/notifications/unsubscribe-auth/AC1RILsPIWI11OshwBflwwC57O4guVqTks5u6TO1gaJpZM4ZQSNr .
I would failover Spider proxy nodes because they are MariaDB databases; if the master proxy node fails, it needs to be managed. In particular, with the ever changing create or replace server and create or replace spider table simply must replicate to the proxy slaves. Proxy nodes also require downtime and maintenance so the need is real.
Ideally, they are managed identically to the shard clusters since they are databases with replication requirements. In simple encapulated fashion that rep-man uses to separate cluster definitions (treat the proxy nodes as their own fail-over cluster). The main difference being we are failing over proxy funciton as opposed to backend function.
Example, for proxy failover the Spider proxy tables would not require adjustment but the load balance mechanism sending the traffic would be adjusted via rep-man DBA custom script.
So the question is what am I missing?
My thoughts are I need to configure the proxy nodes to also show in rep-man as their own cluster for fail-over management.
I suspect an enhancement might be required to treat the proxy cluster different from the backend clusters as to what needs adjusting, that script should be supplied by the dba since it will be custom to how the db requests originate. Example proxy-fail-over-script=""
Current proxy node configuration: [Default] title = "Spider Proxy" shardproxy = true shardproxy-servers = " 99.0.1.1:3306, 99.0.1.2:3306, 99.0.1.3:3306" shardproxy-user = "spiderman:99999999"
mdbshardproxy = true mdbshardproxy-hosts = " 99.0.1.1:3306, 99.0.1.2:3306, 99.0.1.3:3306" mdbshardproxy-user = "spiderman:99999999"
Note with the above configuration the only references to the the two proxy slaves in the error logs are: show below.
NOTE THE TWO PROXY SLAVE NODES at this point in time do not even exist and rep-man is not indicating they are not available. We have taken them down until we can get the nodes to display in the browser and terminal consoles. In this case the slaves should be showing as unavailable.
replication-manager-01 spiderman]# cat /var/log/replication-manager.log | grep -ai "99.0.1.3" 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard3] DEBUG - New MdbShardProxy proxy created: 99.0.1.3 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard1] DEBUG - New MdbShardProxy proxy created: 99.0.1.3 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard2] DEBUG - New MdbShardProxy proxy created: 99.0.1.3 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard2] INFO - Init Proxy Type: shardproxy Host: 99.0.1.3 Port: 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard1] INFO - Init Proxy Type: shardproxy Host: 99.0.1.3 Port: 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard3] INFO - Init Proxy Type: shardproxy Host: 99.0.1.3 Port: 3306 replication-manager-01 spiderman]# cat /var/log/replication-manager.log | grep -ai "99.0.1.2" 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard3] DEBUG - New MdbShardProxy proxy created: 99.0.1.2 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard1] DEBUG - New MdbShardProxy proxy created: 99.0.1.2 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard2] DEBUG - New MdbShardProxy proxy created: 99.0.1.2 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard2] INFO - Init Proxy Type: shardproxy Host: 99.0.1.2 Port: 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard1] INFO - Init Proxy Type: shardproxy Host: 99.0.1.2 Port: 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard3] INFO - Init Proxy Type: shardproxy Host: 99.0.1.2 Port: 3306 replication-manager-01 spiderman]# cat /var/log/replication-manager.log | grep -ai "99.0.1.1" 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard3] DEBUG - New MdbShardProxy proxy created: 99.0.1.1 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard1] DEBUG - New MdbShardProxy proxy created: 99.0.1.1 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard2] DEBUG - New MdbShardProxy proxy created: 99.0.1.1 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard1] INFO - Init Proxy Type: shardproxy Host: 99.0.1.1 Port: 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard2] INFO - Init Proxy Type: shardproxy Host: 99.0.1.1 Port: 3306 2018/12/20 17:25:31 [cluster_mdbshardproxy_shard3] INFO - Init Proxy Type: shardproxy Host: 99.0.1.1 Port: 3306 [root@vpn-replication-manager-01 spiderman]#
But spider nodes are not slaves , they are just proxies without data neither replication ? replication-manager tack the tables on the shards and build those link for you
I just noted in the browser console , top left menu selection below the dashboard link there is a NEW MONITOR option, and one of the monitor types is shard proxy. Is this the path to my objective?
I added 3 shardproxy monitors via same mechanism and they show no where in the browser or terminal UI so perhaps unrelated.
I don't think this is will help that a way to build the config when dynamic config is enable
They should any way show under the proxy tab If not that's a bug
spider proxy nodes have slaves they are mariadb databases
on the proxy tab I see the 3 nodes I defined listed, but no means to manage them for switch or failover
This architecture is not supported what those slaves supposed to do ? May it's an architecture I never faced
Understood, since it ia a MariaDB I as a DBA must accomodate its fail-over. Hoping this evoles to an enhancement. In the mean time what would you suggest?; a separate rep-man config that points to the proxy 3 nodes as if they are just a simple master slave config?
I could use the pre-fail-over script to adjust the traffic inflow to the new master proxy node.
oooooh I see what you wan't to do now ! you would like only one active proxy ,a VIP with a keepalived in between would match in this case and give repman the VIP as spider node
The spider proxy nodes do have critical data ie the shard definitions themselves, and the ip's adjusted at shard cluster failover level, that data replicates to the salves so I have the means to fail-over.
not just rep-man the VIP, since it does not spray the data, but the custom script would be used to adjust the sprayer.
rep-man is ideal for this since you folks have clearly contained its purpose to fail-over/maintenance switch over, to backups, presumably restores. I really like its design so hope I am coming across in ok fashion. We also have maxscale being tested by devops at the moment for managing a website db fail-over and for read write splits. Was hoping to avoid its mult-facet use since it causes issues with sharding, so I have been focused on rep-man.
The plan was that the shard definitions is manage by replication manager and pushdown to the spider nodes . shard definitions are today :
My plan was may be to extend one day to customize the global dictionary with user input like table blacklist or custom partition definition
What I would like to do introduce is a fist list of schema.tables that are universal so federation on multiple shards
k, would love to hear more about that and possibly participate at some point assuming I have more knowledge, as to design intent/direction.
I am not interested in data federation unless its only low volume stuff since it causes difficult scaling issues unless kept simplified. Its hard to keep a shop from reaching all over the place once federation is introduced.
But I am interested in shard by hash of PK on duplicate multiple shards. I have true EXTREME high volume performance/storage scaling requirements. i have done this many times with compute and data grids back before sharding became more formalized. I know this will work assuming I prove the proxy nodes scale, as in they delegate/distribute/aggregate the data efficiently. but for me to get to the next POC steps, ie performance, I have to prove industrial grade fail-over, procedures, and test failover plans first.
I have proven with spider I can take my most extreme HV requirements and insert parent child data all guaranteed to land on one node. The hash pk design would spread different parents data to the other shard nodes. The main goal there is prove I can replicate my current performance that is occuring only on one node for my most extreme I/O and then work towards hourly reporting or more aggregrate reporting across all shard nodes.
In summary convinced on sharding, just need to choose the technology with managements acceptance.
As to the need to fail-over the proxy nodes if they were not databases I would be pursuing a different failover technique. Since the proxy nodes are also MariaDB nodes they should be managed from the same implementation of its related shard cluster nodes due to the need for them to be coordinated.
Ok i have already pushed spider to aggregate 1 billions rows per second so to shard by hash of pk just create same schema.table in each shard master and point the data injector to the spiders nodes
they are coordinated in therms of DML , that's correct that resharding require sync of all spider nodes the way it's done is by reinjecting into new table definition with more shard
T1 (S1-S2) is copied to T2 (S1-S2-S3) and than renames tables will swap T1 to T2 only in tis case we need to do under locking to all proxies correct ?
Exactly, and the fact its transparent to legacy sql goes miles.
How many shard nodes comprised the 1 billion to aggregate per second? .Andwere they VM or metal based?
Some of our HIGHEST VOLUME mult-tenant db schemas have individual tables upwards of 4.5TB. Clearly that is why we are sharding, to horizontally spread that design so it will performance/storage scale and so I can administrate in timely fashion as in fix tables quickly, requires them to be split into multiple smaller tables across shards. It would also allow transition to much smaller commodity based nodes sized for timely restores/alters.
I need to clarify my last comment was prior to your last comment.
Are you referring to resharding? ie T1 ..... T2 discussion above or are you referring to what rep-man is doing with its table server definition modifications?
Was 24 data nodes bar metal based on 256G RAM 64 core and 2 level proxying I think VM of one core or docker based makes more sens as it simplify spider deployment to one table per database instance , where in our deployment we do get Tbl00 to Tbl63
My plan is to over provision the shards for storage growth to last out atleast a year for the following reasons: Spider resharding does not appear to be industrial proofed, ie, done for the DBA. Meaning it currently seems that the DBA has to perform the data rebalancing.
So by buying a year perhaps it has a chance to come to market. In regards to locking the proxy definitions yes, the y key is how does one update the proxy nodes to understand data has been moved to a new node. This has to occur while the business is live.
Yes that's what rpeman do with resharding it's done online and indeed encapsulate that complex procedure inside API call
As for db defnitions and table definitions we currently are multi-tenant where every customers data is 100% security , performance, lock segregated by individual db schemas. Some question this design suggesting combine the customers in to one but I disagree. One it has to be done for contract reasons and from my view in order to performance scale and administrate smaller objects. For now I am only focusing on 1 high volume db instance which plumits the number of tables to max 33.
I am searching for any rep-man resharding scripts to better explain the technique. I understand pull all the existing data by table to another table that is redistributed across all the shards. I am presuming the following rename is when the i/o locking transitions, same as pt-online-chema-change renames. The above presumes you have enough space to redistribute while maintaing the existing live tables. All good, hoping to find more info.
Would be ideal if the rebalance was done on the existing table live in small sets of logically related data.
The proxy's knowledge of what data was just pulled and redistributed would have to be updated as the logical unit of work transitions to another shard, but I dont think spider's design is there yet.
At this point I suspect my shop will pass on MariaDB Replication Manager for two reasons: one is lack of fail-over support for the Spider MariaDB proxy nodes, ie, this particular github issue.
Two, due to rep-man's continuous updates on the Spider Proxy DDL since it is unnecessary exposure to failure addressed in another issue: https://github.com/signal18/replication-manager/issues/264
It should be very easy to add in the proxy node fail-over capabilities since the code already exists for failing over the backend shard clusters. Likewise it should be very easy to place the Spider DDL modified under the DBAs control so it only updates one server definition at point of failover instead of modifying all Spider definitions as addressed in the second issue.
I will check back in to see how Replication-Manager is progressing since I still very much believe in its one binary central focus on fail-over subjects as opposed to a product like MaxScale that is not only handling fail-over but is also serving other DB purposes, such as proxying/routing/filtering queries and data. Maxscale also does not blend well with Spider Sharding since both are proxy nodes.
I am all for Spider. Unfortunately this shop is leaning towards Vitess since the majority of my peers are site reliability engineers; it appears to be more infrastructure / industrial proofed than Spider's approach. I was hoping/counting on Replication-Manager to handle industrial proofing the fail-over and then buying time for Spider internals to be modified to handle rebalancing data internally for the DBA.
My reasons for the aforementioned approach deal with MariaDB databases already perform very well and since the Spider Proxy node is also a MariaDB database it already has everything a shard proxy needs,
With Spider there is very little need to modify the application which is huge, but evidently not enough reason for this shop to also still consider Vitess. Vitess has its own issues but it is catching up in the realm where Spider excels, as they are slowing implementing true SQL support and secondary composite indexes. One could argue that Vitess VGate/Vtablet itself is slowly becoming a real database in itself much like the Spider proxy already is.
The beauty of Spider's proxy design is it is already database time proven for parsing/optimising sql across physical partitions, now in logical partition separate node form. It also allows focus on what MariaDB / MySQL shops already know how to do and that is manage SQL databases. Vitess requires relearning their components rather than just focusing on existing SQL components.
Hi,
All your point make sens and need to be fixed, as you say most of the code is here just need to be rework to not recreate tables if already there or create the servers on refresh , unfortunatly i have very few time to impoved it , i've started and will post you updated on the progress
I am interested in having replication-manager also monitor the Spider Proxy Nodes since they are also MariaDB instances. Ideally from one configuration, rep-man instance.
Based on the following configuration only the backend shard nodes display in the replication-manager-cli console. Any advice as how best to monitor/switch-over/fail-over the Spider Proxy nodes in the same configuration. I would imagine I could do a separate Rep-Man instance just configured for the Spider Proxy nodes but would prefer to manage from one source. If so wondering which topology to choose.
In the below instance I have 3 shards: 1 master 1 slave per shard, soon to be changed to 2 slaves per shard.
I also have a master Spider proxy node and two slave proxy nodes. The below rep-man 2.1 configuration has been tested to handle both switch-over and fail-over for all backend shard nodes, from master to slave back to master tests.
[Cluster_Mdbshardproxy_Shard1] title = "Shard1" db-servers-hosts = "99.0.2.1:3306,99.0.3.1:3306" db-servers-prefered-master = "99.0.2.1:3306" db-servers-credential = "spidman:99999999" db-servers-connect-timeout = 1 replication-credential = "repman:99999999"
[Cluster_Mdbshardproxy_Shard2] title = "Shard2" db-servers-hosts = "99.0.2.2:3306,99.0.3.2:3306" db-servers-prefered-master = "99.0.2.2:3306" db-servers-credential = "spidman:99999999" db-servers-connect-timeout = 1 replication-credential = "repman:99999999"
[Cluster_Mdbshardproxy_Shard3] title = "Shard3" db-servers-hosts = "99.0.2.3:3306,99.0.3.3:3306" db-servers-prefered-master = "99.0.2.3:3306" db-servers-credential = "spidman:99999999" db-servers-connect-timeout = 1 replication-credential = "repman:99999999"
[Default] shardproxy = true shardproxy-servers = "99.0.1.1:3306,99.0.1.2:3306,99.0.1.3:3306" shardproxy-user = "spidman:99999999"