intel / QAT_Engine

Intel QuickAssist Technology( QAT) OpenSSL Engine (an OpenSSL Plug-In Engine) which provides cryptographic acceleration for both hardware and optimized software using Intel QuickAssist Technology enabled Intel platforms. https://developer.intel.com/quickassist
BSD 3-Clause "New" or "Revised" License
398 stars 127 forks source link

low nginx performance #64

Open ustcrliu opened 6 years ago

ustcrliu commented 6 years ago

Hi, I met with a problem: The throughput performance(Connections Per Sec) of Nginx with ssl_engine qat is lower than that without. At the server end, I used command ./nginx to start nginx. And I changed the worker_processes in nginx.conf every time(from 1 to 8). At the client end, I used command openssl s_time ip:port -new -nbio -time 20 -cipher AES128-SHA1 to test the connections / user sec.

Here is the result: avg-without-qat: 4287 3836 3884 3902 3939 3624 3617 3218 avg-with-qat: 3478 3101 3202 3221 2902 2855 3014 1769 worker_processes from 1 to 8 I found that of every worker_processes the cps of nginx with qat is lower than that without. This result is different from that on the white paper: QAT&OpenSSL performance I could not find out why .

conf I read the README.md on github.com/intel/QAT_Engine and github.com/intel/asynch_mode_nginx carefully. I set some parameters in the configuration file accordingly and here it is. nginx.conf: worker_processes 8; worker_cpu_affinity auto; ssl_asynch on; load_module modules/ngx_ssl_engine_qat_module.so; ssl_engine { use_engine qat; default_algorithms ALL; qat_engine { qat_offload_mode async; qat_notify_mode poll; qat_poll_mode internal; qat_internal_poll_interval 10000; } } dh895xcc_dev0.conf: [SHIM] NumberCyInstances = 1 NumberDcInstances = 0 NumProcesses = 32 LImitDevAccess = 1 Crypto Cy0Name = “UserCY0” Cy0InPolled = 1 Cy0CoreAffinity = 0

stevelinsell commented 6 years ago

Hi @ustcrliu,

First lets confirm that your OpenSSL is offloading to QuickAssist:

Can you post the numbers you see when running the following OpenSSL speed test: ./openssl speed -engine qat -elapsed -async_jobs 72 -multi 3 rsa2048 You should be getting an RSA 2K Sign number around 40,000 operations per second, this should give an indication that RSA operations are being offloaded correctly at least.

For the s_time command you are running, are you running it on the same machine or on a different machine across a network? Also for your s_time command do you try and request a file, I didn't see anything in your line above so am I correct in assuming that you aren't specifying one and s_time will perform the handshake only (i.e. you are mostly only interested in the performance of the RSA operations for this test)?

If you run the s_time/nginx tests and watch: QAT 1.6: /proc/icp_dh895xcc_dev0/qat QAT 1.7: /sys/kernel/debug/qat_dh895xcc_*/fw_counters Are the counters increasing?

If you run something likehtop are the cpu cores running nginx being 100% utilised?

Are there any errors coming out in the nginx error.log, or in the kernel.log when you run dmesg?

You may also want to post your question to http://github.com/intel/asynch_mode_nginx if you haven't already, as if the speed numbers above are working as expected then it is likely to be an nginx configuration issue.

Kind Regards,

Steve.

ustcrliu commented 6 years ago

Hi @stevelinsell , Thanks for your reply. I run this command for 5 times, the avg result is RSA 2K sign number 36k/s and verify number 157k/s. So I think OpenSSL is offloading to dh895x

On the same server. I also used another machine to test s_time and that result is a bit lower. Yes, this command is from white paper QAT & OpenSSL performance, and for now I just want to test how many handshakes(connections) per sec there are. I will test requesting a file later.

The QAT version is 1.7, and the counters increased. There are 12 Firmwares here, every time I ran s_time command the counters in a certain firmware increases.

I used htop and the cpu cores running nginx are usually 75%. btw, I ran 200 s_time commands in parallel to make cpu utilities higher. A single s_time command usually comes with a core running nginx 2%.

When I ran dmesg I saw lots of message like "dh895xcc 0000:07:00.0: Processes $pid openssl(and nginx also) exit with orphan rings". But in nginx error.log I did not see anything very related to this problem. I checked the error.log before and after I ran a single s_time command and there was not anything new added to error.log. Maybe I need to further check this log?

stevelinsell commented 6 years ago

Hi @ustcrliu,

It doesn't sound like you are doing anything obviously wrong. What is your machine configuration (Processor(s)/RAM/Hyperthreading Setting)?

Looking at the 'With QAT' results they don't look a million miles away from what I would expect, I would expect you to be getting 3-5K cps per core (various things affect that such as whether hyperthreading is enabled etc.). What looks strange is your 'Without QAT' numbers, I would expect you to be getting around 800-1000 cps max per core possibly lower depending on processor. Are you sure you do not have keepalives enabled on your server? With keepalives enabled, each connection you make from the same client will not need to go through the handshake process and will not perform an RSA operation. This will massively inflate your 'without QAT' results but will not have a huge impact on the 'With QAT' numbers as it just won't have many operations to offload. Note, if keepalives are enabled you'll still see the counters increase on QAT as there will still be some encrypt/decrypt operations offloaded and there will still be a small amount of handshake operations that still happen even when keepalives are turned on.

Kind Regards,

Steve.

tokers commented 6 years ago

I have encountered the similar problem.

I am using the openssl s_time to test the Nginx. When I watching the /sys/kernel/debug/qat_dh895xcc_03:00.0/fw_counters, not all of the accelerate engines receive requests, actually there are only four engines process the requests. So the result reported byopenssl s_time isn't better than the result that I disable the QAT service.

But when I using the command ./openssl speed -engine qat -elapsed -async_jobs 72 rsa2048, all of the accelerate engines receive the requests.

qat service status:

$/etc/init.d/qat_service status
Checking status of all devices.
There is 1 QAT acceleration device(s) in the system:
 qat_dev0 - type: dh895xcc,  inst_id: 0,  node_id: 0,  bsf: 03:00.0,  #accel: 6 #engines: 12 state: up
stevelinsell commented 6 years ago

Hi @tokers,

Can you please have a read of very similar issue https://github.com/intel/QAT_Engine/issues/71 and see if that helps at all with your issue.

Kind Regards,

Steve.

tokers commented 6 years ago

@stevelinsell Thanks! I will read it carefully.