utterances-bot commented 2 years ago

Testing The Silk Platform - Hands-On Technical Analysis of High-Performance I/O in the Cloud | Tanel Poder Consulting

Hands-on technical analysis of a novel data platform for high-performance block I/O in the cloud, tested by Tanel Poder, a database consultant and a long-time computer performance geek. Index Background and motivation Architecture and internals of the Silk Platform I/O throughput for table scanning (for your big data warehouses) IOPS, I/O latency and write performance (OLTP and data loads) Scaling out I/O performance in the cloud (demo of live operation) Summary and architectural opportunities Background and motivation In an article about high-performance flash I/O published earlier this year, I concluded that modern hardware is ridiculously fast, if used right. - Linux, Oracle, SQL performance tuning and troubleshooting - consulting & tra

https://tanelpoder.com/posts/testing-the-silk-platform/

scorellis commented 2 years ago

I'm wondering if you would mind providing any insights into potential drawbacks, bottlenecks, security flaws, etc., in the Silk platform? Thanks in advance!

tanelpoder commented 2 years ago

The whole platform worked (surprisingly?) well. I guess I shouldn't say "surprisingly", as from the architecture doc I already saw a lot of good decisions that made sense. Clearly the Silk folks have the right experience.

Just one thing that I noticed (and spent some additional time on) were I/O latency outliers when putting the minimum configuration (2 c.nodes) under heavy mixed workload. Lots of small random 8kB I/Os and at the same time parallel queries scanning large tables with large 512kB-1MB IOs. So, while the average read latency of a 8kB block was still under a millisecond, there were occasional latency outliers over 100ms for the small I/Os.

I wasn't too worried about this as I was putting the setup under a lot of concurrent load and it was the smallest recommended configuration. You would start seeing latency outliers on any storage platform when you push it close to its max I/O capacity (and the cloud hosting/networking inevitably brings some variability for anything you run in there). But the cool thing was that once I scaled up the cluster from 2 c.nodes to 3.cnodes, the throughput (of course) increased and the I/O latency outliers were pretty much gone. So, perhaps the smallest recommended configuration just "hits the wall" harder when you push it close to the max I/O capacity (and bigger configs handle it more gracefully).

From the manageability, security, governance point of view, all the design decisions and functionality seemed to be right - but I spent almost all of the hands-on time testing performance & scalability.

00pauln00 commented 2 years ago

Tanel, Thank you for the great article. This may be a bit out of scope, but I'll ask anyway -- what risk do you see in the use of ephemeral volumes? The cost and performance advantages are clear but can users be confident that Silk's HA methods are sufficient to protect against the permanent loss of ephemeral devices?

tanelpoder commented 2 years ago

This was pretty much the most important question that I had - how is Silk taking care of redundancy, resiliency while taking advantage of the local (ephemeral) NVMe disks awesome performance. I reviewew their architecture guide before even accepting this job. The architecture guide showed that Silk has pretty serious inventions and architecture there (two-tier architecture where the storage tier uses nodes with the most available NVMe and the compute tier that accesses these "dumb" data-nodes that may fail or crash at any time, but Silk's software compensates for everything).

And it's not just mirroring (where you end up paying 2-3x of the cost due to duplicating data, their K-RAID software handling striping and parity over many disks has oly 12.5% space overhead, not 200%!

Note that I tested only Silk performance and some scalability (as mentioned in the blog) - but their platform is architecturally sound when thinking about data resiliency too. All this in a public cloud - without using some secret internal cloud vendor APIs, I find it pretty cool!

Edit: Oh, to answer you specific question, ephemeral volumes can and do fail (and the instances serving them crash/disappear). That's why the Silk 2-tier architecutre - if one instance with the epehmeral volumes disappears, you still have 23 instances (out of 24) available, so the compute tier can quickly start up a new fresh datanode and resync/recompute the data as needed (and the K-RAID can tolerate some concurrent failures too, as described in the blog). So you can have the local NVMe speed and not worry about inevitable hardware/OS failures thanks to the Silk software layer in the compute nodes overseeing everything.

Note that all this doesn't mean that you can stop taking backups and you may want to replicate some of your data to different global regions (Silk can do that at volume level), to handle disasters, etc.

tanelpoder / blog-comments

Testing The Silk Platform - Hands-On Technical Analysis of High-Performance I/O in the Cloud | Tanel Poder Consulting #27

Testing The Silk Platform - Hands-On Technical Analysis of High-Performance I/O in the Cloud | Tanel Poder Consulting