instructions for scaling up

mvayngrib commented 6 years ago

awesome template guys, thanks for taking the time to develop it!

i'm curious how this scales. Is it as simple as incrementing DesiredTaskCount? Is there a way to have one master node which does all the syncing and additional nodes that only perform json-rpc duties?

kowalski commented 6 years ago

Actually we scale it slightly differently.

We have multiple instances of this stack deployed. Each instance has it's own parameters committed to the repo. The stacks are named like MainnetParity-1, MainnetParity-2, etc.

On top of this we run a serveless stack which keeps the list of URLs of the nodes in dynamodb. For each stack it calls eth_getBlockNumber once a minute. Than it compares all of these to the blockNumber of Infura node. This way it decides if the node is healthy and up to date. The list of healthy nodes is than compiled into the nginx config that is uploaded to S3 bucket. Uploading config to S3 bucket triggers updating the ECS task which runs nginx with this config which load balances requests between the nodes. This way the state of dynamodb is translated into state of nginx proxy and we keep nodes that fallen out of sync detached.

It's been a while since I was planing to write it down as an article and open source that serveless component, so it will probably come soon.

mvayngrib commented 6 years ago

@kowalski thanks for sharing that, I'm looking forward to the article and the serverless component.

i'm curious, is there a reason you didn't use/extend the aws blockchain template for ethereum?

also, i'm thinking about how I can automate the whole sync-then-switch-instances workflow in cloudformation. I have a prototype (based on this repo), but it still requires a manual step: it syncs with c5.large, creates a snapshot, notifies me via email with the snapshot id, but then i still need to run UpdateStack manually to switch it to the smaller instance.

I want the only manual thing to be creating the stack. Do you have an idea of how to accomplish that? Maybe have 2 instances as ECS tasks, the syncing one exits after its done (while the other waits), the post-sync one mounts the volume when it detects the other task is finished and takes over?

kowalski commented 6 years ago

i'm curious, is there a reason you didn't use/extend the aws blockchain template for ethereum?

I don't think there is a posibility to refer to external template and extend it. Please correct me if I'm wrong, but I don't think Cloud Formation supports it. As for using these templates directly, there is few things that were missing, like ability to bootstrap from EBS snapshot and access from VPC security group.

As for "sync-then-switch-instances workflow" there has actually been some changes to how we work comparing to when I wrote down that article. First we run on c5.large all the time so the downgrading step is off. This is because we point web3.HttpProvider used by webbrowser of our dapp to our cluster, instead of Infura. We currently have a steady load of about 50 requests/s on json-rpc proxy, so we needed some extra horsepower.

Also, somewhere around release 1.11 Parity made some fixes around warp sync. As a result we don't update our snapshots that much nowadays. When we need to spin up new machine we do it using old snapshot (like a month ago) and let warp-sync kick in. We can see that about 1-2 hours is enough to get a node like this in sync. Also we have this machinery on top of it, that monitors all the node, notices that the sync is complete and plugs the new instance into the load balancing proxy.

Summing up we don't do that much manual actions around this anymore.

mvayngrib commented 6 years ago

I don't think there is a posibility to refer to external template and extend it. Please correct me if I'm wrong, but I don't think Cloud Formation supports it.

i wasn't being clear, i just meant take that template and adapt it

thanks for explaining. For our scenario, we'll be giving the template to our customers to run under their own accounts, and would want to give them the option of using a smaller node post-sync, as they'll be they only ones using it. Most of them won't be going anywhere near 50 requests/s on json-rpc.

kowalski commented 6 years ago

In this case you could consider giving them the snapshot as well. It's easy to share snapshot between AWS accounts so for clients you can skip the initial sync altogether.

i wasn't being clear, i just meant take that template and adapt it

Now that I took a second look, I see that AWS used nested stacks and this is what prevented us from using it. Nested stacks is a great idea, but sadly you need to hold all your nested stack on S3 bucket and cannot refer to a template with file://./... For me it's a deal-braker. I don't want to have to sync files to S3 to be able to refer to them - seems unelegant.

mvayngrib commented 6 years ago

i agree that the nested stacks make things more annoying. However, their stacks are only nested because they try to provide two deployment options: ECS vs "docker-local", the 2nd one being to run everything manually on one EC2 machine with docker. Once docker-local is removed, nesting becomes unnecessary and things can easily be collapsed into one stack.

Unfortunately, we can't give a snapshot to our customers to use, as it would be a significant step down in security compared to them syncing themselves. We'll keep working towards an automatic switch-over solution :)

kowalski commented 6 years ago

@mvayngrib you can find WIP version of the article about scaling parity nodes in README of this repo: https://github.com/rumblefishdev/jsonrpc-proxy

Please let me know what you think

mvayngrib commented 6 years ago

@kowalski cool, thanks! Sorry to bug you with more questions:

when i update my MainnetParity service, e.g. after pushing a new image, i often get an error like this The closest matching container-instance ...xxx-xxx... is already using a port required by your task. Do you experience this? Based on my reading this is avoided by using a load balancer to do dynamic port mapping.

also, maybe i'm missing something, but with the use of EBS, if AWS at any point places two parity tasks on one machine, will it cause a problem with them both using the same volume?

kowalski commented 6 years ago

@mvayngrib yes we run into the same issue The solution is to change the ServiceType=Daemon on AWS::ECS::Service resource. Sadly at the moment this cannot be done from cloudformation level. That's actually very strange for AWS to introduce a feature but not allow to use from CF level.

You can find question about this feature here: https://forums.aws.amazon.com/thread.jspa?threadID=284010

In the meantime, since we can only run 1 instance of the task per cluster we simply do the deployment by first changing DesiredCount in ECS to 0 and than reverting it back to 1. It's not ideal but good enough until we can use DAEMON service type.

mvayngrib commented 6 years ago

@kowalski i'm trying out a slightly different approach, a combination of yours and AWS's :) Happy to discuss/share approaches if you're interested. My goals:

auto-snapshot and notify user after sync to update stack with ChainSnapshotId
don't let an EBS volume get terminated by autoscaling. Instead of using BlockDeviceMappings, i have EBS volumes in cloudformation and auto-attach to them during the setup scripts (01..., 02...). It seems to survive ASG terminating and picking up instances.
have it be network-agnostic - NetworkName as a parameter.
have a custom indexer run alongside it in another container
nginx container "sidecar" (as AWS calls it), with Authorization header based auth into my custom indexer's rest api

At the moment, I'm not trying to manage a swarm of these stacks yet, like you and Infura.

I have a working prototype, now trying to see how stable it is. Happy to share if you want to try it out

kowalski commented 6 years ago

Few comments:

auto-snapshot and notify user after sync to update stack with ChainSnapshotId

Ok, so I'm not sure this is a good approach. Problem is that when you change parameter of ChainSnapshotId it goes to BlockDeviceMapping property of AWS::AutoScaling::LaunchConfigurationresource. Change in here requires replacement. So it will kill the EC2 machine and create a new one from new snapshot. So when you update this stack the node will have downtime, than restart but get out of sync, and only get up to spead after some time has passed. IMO it's not a way to go.

Now that I think about it, I think it's better to completely ignore BlockDeviceMapping property of LaunchConfiguration and create AWS::EC2::Volume resource in stack directly and mount this specific volume on startup using aws ec2 attach-volume --region us-east-1 --volume-id ${Volume} --instance-id $(curl -s http://169.254.169.254/latest/meta-data/instance-id) --device /dev/sdh from AWS::CloudFormation::Init.

This allows you to completely drop the notion of snapshots. If you restart the node it just picks up when the previous node has left out. The Volume is not deleted when instance goes down - it just gets detached.

The downside of above is that if data gets corrupted for any reason you have to start from scratch. You can mitigate this by adding periodically run lambda which will create a snapshot from your volume. Than in case of data corruption you would delete the whole stack and create a new one specyfing that the new volume is to be created from the last snapshot id.

Anyway this is approach we use for other services which are less likely to corrupt their data (like Gitlab or Verdaccio).

have it be network-agnostic - NetworkName as a parameter.

sure, we use it for kovan nodes too; that's trival change

have a custom indexer run alongside it in another container

We also have indexers but run them on different ECS clusters and have them use the cluster of Parities not the single instance. But yeah, I don't think there is one single right choice of architecture.

nginx container "sidecar" (as AWS calls it), with Authorization header based auth into my custom indexer's rest api

I wasn't aware web3 HttpProvider can cope with basic auth. Good to know. We limit access to json-rpc to just VPC so didn't need to have authorization. But I see how this can be useful.

mvayngrib commented 6 years ago

Change in here requires replacement. So it will kill the EC2 machine and create a new one from new snapshot.

That's true, though if i increase the timeout for signaling a healthy instance, and only signal after catching up from ChainSnapshotId, ASG will first wait for the new one to catch up, and only then remove the old one, so there should be no down time. Does that make sense?

This allows you to completely drop the notion of snapshots.

true, but I like the comfort of having a snapshot :) Some bug in a new version of parity might corrupt the database one day and I don't want to resync from scratch.

Now that I think about it, I think it's better to completely ignore BlockDeviceMapping property of LaunchConfiguration and create AWS::EC2::Volume resource in stack directly...

right, i'm already doing this, though I use 2 volumes (1 per each of 2 AZs) and have to do a bit more work to map to the volume in the right AZ.

I wasn't aware web3 HttpProvider can cope with basic auth. Good to know.

oh, that's not what i meant. I don't expose the json-rpc interface at all. I probably have a slightly different goal for my project. My use case is like this:

internet / aws lambda (not in vpc) -> nginx (check auth header) -> my indexer container -> json-rpc to parity node. The nginx container is exposed to the outside (lambda in my case).

My indexer container exposes only the methods I need in my application. Some of those methods are proxies to json-rpc, and some are custom.

We also have indexers but run them on different ECS clusters

makes perfect sense, but might be out of my scope at the moment

kowalski commented 6 years ago

That's true, though if i increase the timeout for signaling a healthy instance, and only signal after catching up from ChainSnapshotId, ASG will first wait for the new one to catch up, and only then remove the old one, so there should be no down time. Does that make sense?

That's interesting idea. I haven't tried something like this before, it's hard to tell if it will were the way you've described. The reason makes me unsure about this approach is that during the update of CF stack you're not replacing the AWS::EC2::Instance but the AWS::EC2::LaunchConfiguration which configures the autoscaling group. I don't think this type of resource have the notion of rolling deployment, so I think it will actually terminate the old instance without waiting for the signal from the new one. By I may be wrong on this one, please let me know after you test this :)

rumblefishdev / cf-parity-mainnet

instructions for scaling up #2