open-quantum-safe / openssl

UNSUPPORTED Fork of OpenSSL 1.1.1 that includes prototype quantum-resistant algorithms and ciphersuites based on liboqs PLEASE SWITCH TO OQS-Provider for OpenSSL 3
https://openquantumsafe.org/
Other
289 stars 125 forks source link

Issues of s_time function in real environment testing #365

Closed judowha closed 2 years ago

judowha commented 2 years ago

I am testing the performance of mutual TLS connection with hybrid certificates on AWS EC2. The certificate chain of a test group is created in this way:

RootCA
├── SecondCA
│   └──server certificate
└── ThirdCA
     └── client certificate

The signature algorithms used by RootCA, SecondCA, and ThirdCA are the same.

I create many groups with different hybrid algorithms: rsa3072_dilithium2, rsa3072_dilithium2_aes, rsa3072_falcon512, p256_dilithium2, p256_dilithium2_aes, p256_falcon512, p384_dilithium3, p384_dilithium3_aes, p521_dilithium5, p521_dilithium5_aes, p521_falcon1024.

I first start the sever on AWS EC2 instance: openssl s_server -key $1_test/server/server.key -cert_chain $1_test/secondCA/demoCA/secondCA.pem -cert $1_test/server/server.pem -accept $2 -CAfile $1_test/client/client_chain.pem -verify_return_error -Verify 2 -state -WWW

Then I will run s_client on local to test the connection: openssl s_client -key $1_test/client/client.key -cert $1_test/client/client.pem -connect ec2_ip:$2 -CAfile $1_test/rootCA/demoCA/root.pem -verify_return_error -state The connection tests using s_client are ok for all groups. Therefore, I think the created certificate chain should be right

Finally I run s_time on local to test the performance: openssl s_time -key $1_test/client/client.key -cert $1_test/client/client.pem -connect ec2_ip:$2 -CAfile $1_test/rootCA/demoCA/root.pem -new The performance test using s_time for RSA combined algorithms are working well. However, many problems for P256 combined algorithms.

I am not sure the problems happans because of my mistakes or it is the problem of s_time function. Thanks a lot if anyone would like to help me!

baentsch commented 2 years ago

Thanks for this error report. Weird indeed. Some questions, if I may:

Looks like one has to debug into this: I don't recall us having done substantial client-auth testing...

judowha commented 2 years ago

Thanks for this error report. Weird indeed. Some questions, if I may:

  • What's the output of openssl version?
  • Does the error above occur immediately or only after some (successful) s_time handshakes (printed "*")?
  • Are there any p256 hybrids that do work?
  • Does s_time testing work OK if you don't provide/request client certificate authentication, i.e., is this an issue only triggered by client auth?
  • Beyond the rsa-OQS hybrids, do simple/plain p256 certificates also work OK?
  • Could you share the scripts generating all components in order for us to reproduce? Ideally something that creates all certs based on a parameter (OQS-signature algorithm), starting s_server as well as s_client (showing that the basics do work) followed by the s_time call (that fails).

Looks like one has to debug into this: I don't recall us having done substantial client-auth testing...

Thanks for your help! I will clarify the question one by one

For generating certs, I run the shell file named environment_generator:

#!/bin/bash

mkdir $1_test
cd $1_test
mkdir rootCA 
cd rootCA 
mkdir certs  crl  demoCA  
cd demoCA 
mkdir newcerts private  
touch index.txt serial   
echo 01 >> serial 
cd .. 
openssl req -x509 -newkey $1 -passout pass:123456 -keyout demoCA/private/root.key -subj /C=CN/ST=test/L=test/O=test/OU=test/CN=root -out demoCA/root.pem || ! cd ../.. || ! rm -rf $1_test || exit
cd ..

mkdir secondCA 
cd secondCA 
mkdir certs  crl  demoCA  
cd demoCA 
mkdir newcerts private  
touch index.txt serial  
echo 01 >> serial  
cd .. 
openssl req -new -newkey $1 -passout pass:123456 -keyout demoCA/private/second.key -out second.csr -subj /C=CN/ST=test/L=test/O=test/OU=test/CN=second  
cd ../rootCA 
openssl ca -extensions v3_ca -in ../secondCA/second.csr -days 3650 -out ../secondCA/demoCA/secondCA.pem -cert demoCA/root.pem -keyfile demoCA/private/root.key -passin pass:123456
cd .. 

mkdir thirdCA 
cd thirdCA 
mkdir certs  crl  demoCA  
cd demoCA 
mkdir newcerts private  
touch index.txt serial  
echo 01 >> serial  
cd .. 
openssl req -new -newkey $1 -passout pass:123456 -keyout demoCA/private/third.key -out third.csr -subj /C=CN/ST=test/L=test/O=test/OU=test/CN=third  
cd ../rootCA 
openssl ca -extensions v3_ca -in ../thirdCA/third.csr -days 3650 -out ../thirdCA/demoCA/thirdCA.pem -cert demoCA/root.pem -keyfile demoCA/private/root.key -passin pass:123456
cd .. 

mkdir server
cd server
openssl req -new -newkey $1 -passout pass:123456 -keyout server.key -out server.csr -subj /C=CN/ST=test/L=test/O=test/OU=test/CN=server 
cd ../secondCA 
openssl ca -in ../server/server.csr -cert demoCA/secondCA.pem -keyfile demoCA/private/second.key -passin pass:123456 -out ../server/server.pem -days 365 -subj /C=CN/ST=test/L=test/O=test/OU=test/CN=server
cd ..

mkdir client
cd client
openssl req -new -newkey $1 -passout pass:123456 -keyout client.key -out client.csr -subj /C=CN/ST=test/L=test/O=test/OU=test/CN=client 
cd ../thirdCA 
openssl ca -in ../client/client.csr -cert demoCA/thirdCA.pem -keyfile demoCA/private/third.key -passin pass:123456 -out ../client/client.pem -days 365 -subj /C=CN/ST=test/L=test/O=test/OU=test/CN=client

cd ../client
cat ../thirdCA/demoCA/thirdCA.pem ../rootCA/demoCA/root.pem > client_chain.pem

To use the file, you can run ./environment_generator ALGO_NAME (i.e. ./environment_generator p256_dilithium2). The password for the key and the cert will be 123456

For starting a server, here is the shell script named start_server:

#!/bin/bash
openssl s_server -key $1_test/server/server.key -cert_chain $1_test/secondCA/demoCA/secondCA.pem  -cert $1_test/server/server.pem -accept $2 -CAfile $1_test/client/client_chain.pem -verify_return_error -Verify 2 -WWW -state -tls1_3

To use the script, you can run ./start_server ALGO_NAME PROT (i.e. ./start_server p256_dilithium2 4433)

For starting a client, here is the shell script named start_server:

#!/bin/bash
openssl s_client -key $1_test/client/client.key  -cert $1_test/client/client.pem -connect MY_IP:$2 -CAfile $1_test/rootCA/demoCA/root.pem -verify_return_error -state

The MY_IP inside needs to be changed to the public address of the server. To use the script, you can run ./start_client ALGO_NAME PROT (i.e. ./start_client p256_dilithium2 4433)

For starting a test, here is the shell script named start_test:

#!/bin/bash
openssl s_time -key $1_test/client/client.key -cert $1_test/client/client.pem -connect MY_IP:$2 -CAfile $1_test/rootCA/demoCA/root.pem -new

The MY_IP inside needs to be changed to the public address of the server. To use the script, you can run ./start_test ALGO_NAME PROT (i.e. ./start_test p256_dilithium2 4433)

baentsch commented 2 years ago

Thanks for the additional information and scripts above. I get somewhat different results:

p256_falcon512 works

The performance test using s_time for RSA combined algorithms are working well.

?

18884 connections in 14.69s; 1285.50 connections/user sec, bytes read 0
18884 connections in 31 real seconds, 0 bytes read per connection

$ grep error log-falcon | wc
  18884  113304  963084
SSL_accept:TLSv1.3 early data
depth=2 C = CN, ST = test, L = test, O = test, OU = test, CN = root
verify return:1
depth=1 C = CN, ST = test, O = test, OU = test, CN = third
verify return:1
depth=0 C = CN, ST = test, O = test, OU = test, CN = client
verify return:1
SSL_accept:SSLv3/TLS read client certificate
SSL_accept:SSLv3/TLS read certificate verify
SSL_accept:SSLv3/TLS read finished
SSL_accept:error in SSLv3/TLS write session ticket

Side notes:

--> If you agree to the above (?) the only question remaining is what causes the consistent "write session ticket" error (that does not occur when feeding your scripts with "rsa:3072"): Will look into that next...

judowha commented 2 years ago

Thanks for your reproduction test.

If you agree to the above (?) the only question remaining is what causes the consistent "write session ticket" error (that does not occur when feeding your scripts with "rsa:3072")

I agree with your result. However, if I test on localhost, all groups including rsa3072 combined algorithms are all showing "write session ticket" error. If I am using simple rsa3072 or p256 certificate I will get no error.

Moreover, if I put sever into AWS VM, p256_falcon could finish the test with no error message and other groups will have the problems I mentioned before. It may be because of AWS networking issues.

I am very glad to provide you with my VM information if you need it, but here I couldn't upload the key pair because I couldn't upload pem files. I have created a public repository and uploaded my EC2 key pair, you can access it by using: ssh -i "openssl.pem" ubuntu@ec2-3-0-114-247.ap-southeast-1.compute.amazonaws.com. However please tell me when you need to use it because the server keeps turning down most of the time and I will start it if you need it for testing.

baentsch commented 2 years ago

Thanks for confirming that

When/if I understand the "write session" error, I'll then test on an AWS VM of my own first and will let you know my experiences -- possibly coming back with a request for access to your VM if I cannot reproduce things otherwise... Please let me know which AWS VM type you are using.

judowha commented 2 years ago

I am using the Ubuntu Server 18.04 LTS, x86, AMI ID: ami-07315f74f3fa6a5a3.

baentsch commented 2 years ago

From https://wiki.openssl.org/index.php/TLS1.3:

If a client sends it's data and directly sends the close notify request and closes the connection, the server will still try to send tickets if configured to do so. Since the connection is already closed by the client, this might result in a write error and receiving the SIGPIPE signal. The write error will be ignored if it's a session ticket. But server applications can still get SIGPIPE they didn't get before.

I think we're seeing permitted behaviour with the write session ticket "error message": 1) It only becomes visible when activating "-state" 2) As per the above, it is permitted behaviour -- and honestly, I'd expect it as the s_trace client should close the connection immediately after handshake in order to begin the next handshake.

The question remaining in my mind is why the pure RSA:3072 test doesn't show this message -- but then again, the messages could be short enough to permit sending the session ticket right along with the handshake completion message -- which is something that longer certificates/certificate processing (QSC+classic crypto) arguably make less likely.

I'll try to see how the behaviour on an AWS VM differs, but so far, no real fault is visible. For the "relative value" of the "write session ticket" message also see this code comment...

Finally and FWIW, just add the s_server option "-no_ticket" to your "start_server.sh" script and the error is gone :-)

baentsch commented 2 years ago

Now completed testing on an AWS VM: Absolutely the same behaviour: No other error than the by now known "session ticket" one and absolutely no error if passing "-no_ticket" to the s_server command.

Maybe you ought to start another AWS VM and see whether things occur on that one again.

Platform I used (uname -a): "Linux ip-172-31-40-63.us-east-2.compute.internal 5.10.93-87.444.amzn2.x86_64 #1 SMP Thu Jan 20 22:50:50 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux").

Suggest to close the issue.

judowha commented 2 years ago

Thanks for your test. I just start another AWS VM but the connection test still shows the same problem. May I ask how you set up your VM and what is your security group? Also, can I try to ssh to your AWS VM and run the test again? I suspect my Internet may be a reason for the error.

baentsch commented 2 years ago

I didn't set up a special security group (normal SSH access). The AMI is amzn2-ami-kernel-5.10-hvm-2.0.20211103.1-x86_64-gp2. I'm using that VM for other purposes, but I could set up another VM to give you access to (and repeat the test) using your AMI. Edit: Looks like your AMI isn't available to me, but I can get a generic Ubuntu 18 LTS on x86_64...

judowha commented 2 years ago

May I clarify that you put the server on VM and run the test client on your local computer (not run both server and client on VM)? Because if you test in this way you should have a security group that enables a port for Custom TCP. If the test is conducted in the same way I did, then I appreciate it a lot if you could set up a Ubuntu 18 LTS on x86_64 VM and give me the access. If the test on your side is conducted successfully but failed on my side then I will be sure that it is my internet problem.

baentsch commented 2 years ago

May I clarify that you put the server on VM and run the test client on your local computer (not run both server and client on VM)

Sorry for being unclear: I put client/s_time and server onto the VM. Hence no external port open. Did you also test that way?

judowha commented 2 years ago

Nope. In my initial question, I put client/s_time on my local computer and put the server on VM because I am trying to simulate a real environment test. Would you mind testing again in this way and see if the same error happens?

baentsch commented 2 years ago

Ah, now I get it. Yes, in such setting, more and more "error in SSLv3/TLS read client certificate" error messages appear (more and more, depending on the size of the certificates).

However, this clearly is not an issue of OQS-openssl as the same issue appears if you use 8k RSA keys and standard/unmodified openssl:

Client:

$ ./start_test.sh rsa\:8192 4433
Collecting connection statistics for 30 seconds
*****************************************************************************************************************************************************************************************

185 connections in 11.83s; 15.64 connections/user sec, bytes read 0
185 connections in 31 real seconds, 0 bytes read per connection
$ openssl version
OpenSSL 1.1.1  11 Sep 2018

Server:

$ ./start_server.sh rsa\:8192 4433 > log 2>&1
$ grep error log | wc
    185    1110    9933
$ openssl version
OpenSSL 1.1.1  11 Sep 2018

-> You may want to consider opening an issue to the main/upstream OpenSSL project.

However, I'm not totally convinced that this is a real error: The client sees a correct handshake completion and kills the connection. If this is too fast for the server to properly recognize full arrival of all data (the client cert has been validated according to the logs, so all necessary data must have arrived), then this is probably at most a state engine transitioning error when closing out a terminated connection.

But then again we put a lot of time into reproducing things, so it might be worth while creating an issue with complete "reproduction" instructions (./envgen.sh rsa:8192 && ./start_server.sh rsa\:8192 4433 > log 2>&1 on the server side, ./start_test.sh rsa\:8192 4433 on the client side; using openssl 1.1.1 on x86_64 ubuntu18).