gramineproject / gramine

A library OS for Linux multi-process applications, with Intel SGX support
GNU Lesser General Public License v3.0
603 stars 201 forks source link

Performance latency with redis benchmark tests #313

Closed salsal97 closed 2 years ago

salsal97 commented 2 years ago

Description of the problem

On comparing the performance of the gramine redis server with native using the redis-benchmark tool, gramine comes to be almost 2.75 times slower than running with its native counterpart. This test was run on an ubuntu 20.04 azure DC-series VMs.

Steps to reproduce

Followed quickstart instructions to install gramine using apt -

sudo curl -fsSLo /usr/share/keyrings/gramine-keyring.gpg https://packages.gramineproject.io/gramine-keyring.gpg
echo 'deb [arch=amd64 signed-by=/usr/share/keyrings/gramine-keyring.gpg] https://packages.gramineproject.io/ stable main' | sudo tee /etc/apt/sources.list.d/gramine.list
sudo apt-get update
sudo apt-get install gramine  
mkdir -p "$HOME"/.config/gramine
openssl genrsa -3 -out "$HOME"/.config/gramine/enclave-key.pem 3072
git clone --depth 1 https://github.com/gramineproject/gramine.git
cd gramine/CI-Examples/redis
sudo apt-get install -y libprotobuf-c-dev
edit gramine/CI-Examples/redis/redis-server.manifest.template -> make debug = false 
make SGX=1 SGX_SIGNER_KEY="$HOME"/.config/gramine/enclave-key.pem
make start-gramine-server       # run to test the server starts (v6.0.5)
sudo apt install redis-tools

+ ./redis-server --save '' --protected-mode no &
+ redis-benchmark -q -h 127.0.0.1 -p 6379 -c -n
PING_INLINE: 33467.20 requests per second
PING_BULK: 33444.82 requests per second
SET: 32883.92 requests per second
GET: 33366.70 requests per second
INCR: 33025.10 requests per second
LPUSH: 33500.84 requests per second
RPUSH: 33112.58 requests per second
LPOP: 32185.39 requests per second
RPOP: 33014.20 requests per second
SADD: 33079.72 requests per second
HSET: 33636.06 requests per second
SPOP: 33456.00 requests per second
LPUSH (needed to benchmark LRANGE): 32862.31 requests per second
LRANGE_100 (first 100 elements): 26652.45 requests per second
LRANGE_300 (first 300 elements): 17809.44 requests per second
LRANGE_500 (first 450 elements): 13987.97 requests per second
LRANGE_600 (first 600 elements): 11993.28 requests per second
MSET (10 keys): 31026.99 requests per second

+ gramine-sgx redis-server --save '' --protected-mode no &
+ redis-benchmark -q -h 127.0.0.1 -p 6379 -c -n
PING_INLINE: 12661.43 requests per second
PING_BULK: 12838.62 requests per second
SET: 12616.70 requests per second
GET: 12761.61 requests per second
INCR: 12748.60 requests per second
LPUSH: 12591.29 requests per second
RPUSH: 12500.00 requests per second
LPOP: 12564.39 requests per second
RPOP: 12603.98 requests per second
SADD: 12768.13 requests per second
HSET: 12613.52 requests per second
SPOP: 12865.05 requests per second
LPUSH (needed to benchmark LRANGE): 12597.63 requests per second
LRANGE_100 (first 100 elements): 11466.58 requests per second
LRANGE_300 (first 300 elements): 9390.55 requests per second
LRANGE_500 (first 450 elements): 8177.95 requests per second
LRANGE_600 (first 600 elements): 7374.09 requests per second
MSET (10 keys): 12122.68 requests per second

Expected results

Results are as expected; however, I am wondering if this is the most efficient way to run the redis server on gramine? Are there any other configurations I might benefit from tweaking, to make this benchmark test run faster? Interested in any input.

Actual results

NA

dimakuv commented 2 years ago

This test was run on an ubuntu 20.04 azure DC-series VMs.

I'm assuming these are the machines that have only 128MB of EPC (Enclave Page Cache)? You can easily check it via the is-sgx-available utility that comes with Gramine.

The size of EPC is very important for performance. You can read through this docs, for example: https://gramine.readthedocs.io/en/latest/devel/performance.html#choice-of-sgx-machine

There are also several other subtle issues with SGX machines that can contribute to bad performance. For example, some older SGX machines do not have RDTSC instruction enabled inside SGX enclaves -- this also deteriorates enclave performance.

I am wondering if this is the most efficient way to run the redis server on gramine?

A typical way to test Redis is to run the client (redis_benchmark) on another machine, connected via 1Gb or 10Gb Ethernet link with your main SGX machine (where redis-server runs). This setup emulates the real-world environments where applications from the Internet/other machines on the cluster connect to the Redis server.

Because you run both the client and the server on the same machine, you don't account for the overhead/latency of the network link. This setup does not reflect how things work in reality, and thus shows unnecessarily bad performance.

Are there any other configurations I might benefit from tweaking, to make this benchmark test run faster?

redis-benchmark by default updates a single key-value pair (if I remember correctly). In reality, Redis servers hold thousands and millions of key-value pairs. Also, the typical access pattern for Redis is 80% reads and 20% writes (or 95% reads and 5% writes) -- it depends on what kind of workload you want to simulate with your tests. There are many resources on the Internet on how to do proper benchmarking of databases, including how to benchmark Redis. For example, the classic one: https://redis.io/topics/benchmarks

If you want to get the best performance of Redis in Gramine-SGX, probably the best configuration will be something like this:

The above configuration will probably give you something like 5% overhead of Gramine-SGX in comparison to native (this is just my projection, I have no real numbers to back up my hypothesis). But please note that my proposed setup may be not too realistic (especially the 1Gbps network link -- in an enterprise cluster, the network link would be at least 10Gbps I guess).

salsal97 commented 2 years ago

Hi,

Thank you for your response. I'm running this on an icelake machine, the output of is-sgx-available is here -

$ is-sgx-available 
SGX supported by CPU: true
SGX1 (ECREATE, EENTER, ...): true
SGX2 (EAUG, EACCEPT, EMODPR, ...): true
Flexible Launch Control (IA32_SGXPUBKEYHASH{0..3} MSRs): true
SGX extensions for virtualizers (EINCVIRTCHILD, EDECVIRTCHILD, ESETCONTEXT): false
Extensions for concurrent memory management (ETRACKC, ELDBC, ELDUC, ERDINFO): false
CET enclave attributes support (See Table 37-5 in the SDM): false
Key separation and sharing (KSS) support (CONFIGID, CONFIGSVN, ISVEXTPRODID, ISVFAMILYID report fields): true
Max enclave size (32-bit): 0x80000000
Max enclave size (64-bit): 0x100000000000000
EPC size: 0x200000000
SGX driver loaded: true
AESMD installed: true
SGX PSW/libsgx installed: true

Going through the factors for best performance here,

dimakuv commented 2 years ago

@salsal97 There is a memtier benchmark that I used when doing some Redis benchmarking -- this benchmark supports different ratio of reads/writes, and many other interesting knobs. See https://github.com/RedisLabs/memtier_benchmark.

The overhead of 2.75 is too much for a proper Icelake server. Something doesn't add up here.

  1. What is the size of RAM on your machine? You can dump the output of free -h command.

  2. When you enable warnings in the Gramine log (by adding loader.log_level = "warning" in the Redis manifest file), do you see the message all RDTSC/RDTSCP instructions are emulated (imprecisely) via gettime() syscall?

    • This message indicates that the machine doesn't support RDTSC instruction inside SGX enclaves (Icelake clients have this issue, if I remember correctly)
    • Emulating RDTSC is very expensive, so the overhead of 2.75 can be explained by this

since I am using an azure VM, I cannot connect the client on a different machine - is there any way to get around this?

You can make this Azure VM visible to the outside world, by asking for a public IP (there is a checkbox for this somewhere, when you create a VM in Azure portal). And then you can just connect to this public IP from any machine (like your local laptop).

Alternatively, you can create two VMs in an Azure cluster.

salsal97 commented 2 years ago

Hi @dimakuv, thanks again for your response.

  1. I have a total of 16 GB RAM on this machine
  2. I don't see that message when log_level is set to "warning"

Also, I'm guessing changing the connection between the client and server would not make a huge difference since the same test was performed with the same conditions for native and then compared.. I'm mostly looking for if there are any release mode configuration settings on gramine or anything else of that sort that would possibly have optimizations included and might improve performance?

dimakuv commented 2 years ago

I'm mostly looking for if there are any release mode configuration settings on gramine or anything else of that sort that would possibly have optimizations included and might improve performance?

@salsal97 I hope that you either used the Gramine package (as described in https://gramine.readthedocs.io/en/latest/quickstart.html) or built Gramine from GitHub sources with --buildtype=release (as described in https://gramine.readthedocs.io/en/latest/devel/building.html#id2). In both these cases you'll get a release version of Gramine (i.e., optimized for performance).

For different runtime performace optimizations, please read through https://gramine.readthedocs.io/en/latest/devel/performance.html.

I just tried to add the following optimizations to my local redis-server.manifest.template:

libos.check_invalid_pointers = false
sgx.preheat_enclave = true
sgx.rpc_thread_num = 8  # but please note that this comes at the cost of additional CPU cores!

This gave me a huge performance boost in my local run. Before these optimizations, I observed 2x overhead. With these optimizations, I observe no overhead.

salsal97 commented 2 years ago

@dimakuv

I did use the gramine installation from the quickstart instructions, so we are covered on that front.

Thank you for the manifest suggestions. This brought the performance of the gramine redis benchmark test up considerably and its now running more or less on par with native, even faster in some cases.

dimakuv commented 2 years ago

Great to hear that your Redis runs with an expected perf overhead now :) I am closing this issue, but if you think there is still something to discuss, feel free to reopen it.