status-im / infra-nimbus

Infrastructure for Nimbus cluster
https://nimbus.team
9 stars 6 forks source link

Research Hetzner as a hosting provider #45

Closed arthurk closed 3 years ago

arthurk commented 3 years ago

Our current testnet servers for pyrmont (and soon prater) are running on AWS where the cost is too high, it's going to be close to 10k USD for both testnets. For the new prater testnet we can look into a cheaper hosting provider.

For the prater testnet we're using the same instance type as for the large pyrmont instances. These are AWS z1d.large instances with 2 CPU, 16GB RAM and a 150GB nvme drive.

In #41 we've already discussed this and Hetzner came up as a possible provider.

Monitoring

To estimate which server resources we need, we can look at the infra metrics we've collected from the pyrmont testnet. They can be seen in Grafana.. The data I'm looking at is for z1d.large instances for the stable branch.

The instances are maxing out the CPU (near 100% CPU usage) and the nimbus_beacon_node process only uses a single core. The instances are running on a 4 GHz core freq. The RAM usage is only 1.7GB:

Screen Shot 2021-03-19 at 1 40 15 PM

The disk space used is around 25G and disk operations around 60 IOPS write and below 10 IOPS for read (the spikes are probably due to log-rotation since debug logging is enabled which produces a lot of data):

Screen Shot 2021-03-19 at 4 44 26 PM

(-66 is the write iops, 1 is the read iops)

Overall I think that a machine with 2 CPU (but with a high core frequency), 4GB RAM and an SSD drive with 50GB is enough.

Hetzner Instances

There's Hetzner Cloud: https://www.hetzner.com/cloud which has interesting machines such as CCX11 with 2 (dedicated) cores, 8 gb ram, 80gb drive for €24.88/mo. But the CPU is Intel Xeon Skylake with 2,1GHz.

Taking a look at geekbench scores:

The cheapest dedicated machine I could find is AX41 with AMD Ryzen 5 3600 6-core, 64 GB RAM and 4 TB HDD for 40.46 € monthly + 46.41 € setup fee. Can be customized on https://www.hetzner.com/dedicated-rootserver/ax41/configurator. With ECC RAM and SSD it's around 50 EUR (60 USD).

We pay around 380 USD/month for an AWS instance. We could get better performance using a Ryzen dedicated machine on hetzner for 60 USD/month. Since the nimbus process is single-core we'd probably pay for 6 cores and not use 5 of them. Same for RAM and Disk. We'd have 60GB unused ram and near 4TB of empty HDD space but maybe it can be used for something else

mratsim commented 3 years ago

IO is more often the limiting factor in our case though we improved the situation greatly in the past month. The small instances in particular require 10x more IO at the moment https://metrics.status.im/d/pgeNfj2Wz23/nimbus-fleet-testnets?orgId=1&var-instance=stable-small-01.aws-eu-central-1a.nimbus.pyrmont

image

We will add multithreading in select case but will likelily optimize for 2 cores (1 thread networking, 1 thread the rest) and/or 4 cores for compute intensive cryptographic tasks (Raspberry Pi 4 has 4 cores).

Regarding Hertzner, the German website has lower cost (10~15%) when you use 6-month contracts.

arnetheduck commented 3 years ago

The alternative is to run multiple beacon_node instances on the same server - the limiting factor on the servers with many validators is that it takes time to sign all attestations.

zah commented 3 years ago

I've tried to run a node on a 2.1 Ghz Xeon, but the results were poor. Something that we can consider on the servers with larger core counts is to run multiple instances of the beacon node. My expectation is that this will work quite well if the machine has good I/O performance.

mratsim commented 3 years ago

It seems like Hertzner updated their offer and now the 40€/month gives is 1TB of NVMe

image

1 extra factor for using Ryzen: they support the SHA-NI extension https://en.wikipedia.org/wiki/Intel_SHA_extensions which is in out top 3 bottlenecks after BLS cryptography and IO.

Note: without VAT this ultimately comes down to 34€/month.

arthurk commented 3 years ago

I've requested the budget for a Hetzner dedicated server https://www.hetzner.com/dedicated-rootserver/ax41-nvme

For now it's a single server where we can develop against and make sure everything works. For infra we need to write a new module for hetzner since we haven't used this provider before. The idea of running multiple beacon_node instances on one machine came up. We can test that on this machine.

If everything works nicely we can make a migration plan to switch other testnet servers from AWS to Hetzner.

arthurk commented 3 years ago

The budget has been approved and I've been trying to register an account with Hetzner but it fails during the payment step with a JS redirect error. I've contacted their support and waiting for an answer from a human:

 Hi there

We received your email. Thanks for writing to us.

You have probably realized by now that this is an automated message.

We need some time to process each email that we receive, but we will write back to you personally as soon as we can. Thank you for your
understanding!

Kind regards

Your Hetzner Online Team

Hetzner Online GmbH
Industriestr. 25
91710 Gunzenhausen / Germany
Tel.: +49 9831 505-0
Fax:  +49 9831 505-3
www.hetzner.com

Register Court: Registergericht Ansbach, HRB 6089
CEO: Martin Hetzner, Stephan Konvickova, Günther Müller

For the purposes of this communication, we may save some
of your personal data. For information on our data privacy
policy, please see: www.hetzner.com/datenschutzhinweis

part of the request:
> Hi,
>
> I'm trying to buy a dedicated server and register an account but nothing
> happens after I approve the payment.
>
> In the attached screenshots I've entered the card details and confirmed the
> payment with my credit card company. The website recognizes this but there
> is no redirect. I can see that there's an error in the JS console related
> to a "redirect".
>
> What should I do?

Screenshots of what happened:

1) Request payment

hetzner-payment-request

2) Payment approved

payment-confirmed

3) After payment is confirmed in the app, the website doesn't redirect and hangs:

hetzner-reg-fails

4) Inspector shows js error

hetzner-js-error
arthurk commented 3 years ago

The account is registered. Turns out they don't support Firefox.

Server is ordered and I'm waiting for them to process it

Dear Sir or Madam Thank you for choosing Hetzner Online as your web hosting partner. We have received your order and shall inform you once we have activated your request.

According to https://docs.hetzner.com/general/others/order-processing/ this might take "up to about 12-14 workdays"

arthurk commented 3 years ago

The server has arrived but to integrate it into our current infrastructure we need to setup additional servers since this server is from a new provider and in a new data-center. This means setting up 3 servers for Consul, 1 server for prometheus and 1 server for logstash. We'll be using Hetzner Cloud for that and the tf module will be available in https://github.com/status-im/infra-tf-hetzner-cloud.

When the supporting infra is setup we can use the Hetzner dedicated server to run the validators on it.

I'm closing this issue since the "research" part is done, and open a new issue for the implementation part.