Open timijntema opened 5 years ago
Does this still happen if you don't use the client cert?
I retested using the following cases:
# command: curl https://127.0.0.1:9999 --cacert rootCA/rootCA.crt --cert client/client.crt -k -v
* Expire in 0 ms for 6 (transfer 0x55e570e825c0)
* Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55e570e825c0)
* Connected to 127.0.0.1 (127.0.0.1) port 9999 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* unable to set private key file: 'client/client.crt' type PEM
* Closing connection 0
curl: (58) unable to set private key file: 'client/client.crt' type PEM
# No pistache output
and
# command: curl https://127.0.0.1:9999 --cacert rootCA/rootCA.crt --key client/client.key -k -v
# Curl output
* Expire in 0 ms for 6 (transfer 0x563067b425c0)
* Trying 127.0.0.1...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x563067b425c0)
* Connected to 127.0.0.1 (127.0.0.1) port 9999 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: rootCA/rootCA.crt
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: C=NL; ST=UTRECHT; O=Accenture; CN=192.168.42.1
* start date: Jun 5 10:32:42 2019 GMT
* expire date: Oct 17 10:32:42 2020 GMT
* issuer: C=NL; ST=UTRECHT; O=Accenture; CN=rootca
* SSL certificate verify ok.
* SSL_write() returned SYSCALL, errno = 104
* Failed sending HTTP request
* Connection #0 to host 127.0.0.1 left intact
curl: (55) SSL_write() returned SYSCALL, errno = 104
# Pistache output
140065518806208:error:1417C0C7:SSL routines:tls_process_client_certificate:peer did not return a certificate:../ssl/statem/statem_srvr.c:3676:
The I ran both curl commands twice without restarting the server. In both cases the server did not crash.
Could you also try running this multiple times:
curl --insecure -v https://127.0.0.1:9999 2>&1 | \
awk 'BEGIN { cert=0 } /^\* Server certificate:/ { cert=1 } /^\*/ { if (cert) print }'
Sorry for all the questions, I don't have my dev laptop with me at the moment.
From what you've sent so far it looks like my original assumption was correct, but I need more data before I can draft a fix.
I've gotten around to testing this myself now, I can confirm that this bug is present, stack trace below.
It looks lie whatever futex(0x56427dbdbcb8, FUTEX_WAKE_PRIVATE, 1)
does isn't successful on the second request.
@knowledge4igor @Ne02ptzero any ideas? I haven't worked directly with openssl before.
First request:
epoll_wait(3, [{EPOLLIN, {u32=4, u64=4}}], 1024, -1) = 1
accept(4, {sa_family=AF_INET, sin_port=htons(48244), sin_addr=inet_addr("127.0.0.1")}, [16]) = 18
read(18, "\26\3\1\2\0", 5) = 5
read(18, "\1\0\1\374\3\3\177\321\345\f\246\273\306\31\357\2148\1ZFW\241d\325\312\243\350\23\246x\360\221"..., 512) = 512
getpid() = 30057
getpid() = 30057
getpid() = 30057
getpid() = 30057
write(18, "\26\3\3\0Y\2\0\0U\3\3\206\3033\326\22YH\245\306V\22g76\310e\21 e\342\235"..., 1528) = 1528
read(18, "\26\3\3\0F", 5) = 5
read(18, "\20\0\0BA\4u\3478\247\200\255\273\213\245^\3\362\347\276!\3469S\220\364\363\210\274\203\213N"..., 70) = 70
read(18, "\24\3\3\0\1", 5) = 5
read(18, "\1", 1) = 1
read(18, "\26\3\3\0(", 5) = 5
read(18, "~cx\372\34\335\207O\300$a\312L=\255N\206\324&\336!\271a\334Zl_MV\315}\r"..., 40) = 40
getpid() = 30057
write(18, "\24\3\3\0\1\1\26\3\3\0(\310\336\227\337\355\251^B\255M\314\337P\n\242\177M\324|5\373"..., 51) = 51
fcntl(18, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0
write(11, "\1\0\0\0\0\0\0\0", 8) = 8
futex(0x56427dbdbcb8, FUTEX_WAKE_PRIVATE, 1) = 1
Second request:
epoll_wait(3, [{EPOLLIN, {u32=4, u64=4}}], 1024, -1) = 1
accept(4, {sa_family=AF_INET, sin_port=htons(48246), sin_addr=inet_addr("127.0.0.1")}, [16]) = 18
read(18, "\26\3\1\2\0", 5) = 5
read(18, "\1\0\1\374\3\3\1\356\331\1\2245\276o\235\232\301\177:I2\223!t\252}*\232\224l$<"..., 512) = 512
getpid() = 30057
getpid() = 30057
getpid() = 30057
write(18, "\26\3\3\0Y\2\0\0U\3\3l`)\261k\2736\335\330\331\241\362;F\5\332(-\257x\324"..., 1528) = 1528
read(18, "\26\3\3\0F", 5) = 5
read(18, "\20\0\0BA\4*\rs\323\313\207]\323\260Ac\0.\353q}\22\213\314\236y\25a\237<g"..., 70) = 70
read(18, "\24\3\3\0\1", 5) = 5
read(18, "\1", 1) = 1
read(18, "\26\3\3\0(", 5) = 5
read(18, "B\16\245\312gi\32\243\372\342\5\2662\3\212z\214\330\27\257\244\244\1\374\3471v;\225j\243\354"..., 40) = 40
getpid() = 30057
write(18, "\24\3\3\0\1\1\26\3\3\0(\224\276N=yw\17sH!\32\260\332\343\317|DM\301\326l"..., 51) = 51
fcntl(18, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0
write(11, "\1\0\0\0\0\0\0\0", 8) = 8
epoll_wait(3, terminate called after throwing an instance of 'std::runtime_error'
what(): Success
<unfinished ...>) = ?
+++ killed by SIGABRT +++
Aborted
Result:
# Command: curl --insecure -v https://127.0.0.1:9999 2>&1 | awk 'BEGIN { cert=0 } /^\* Server certificate:/ { cert=1 } /^\*/ { if (cert) print }'
# Curl output
* Server certificate:
* subject: C=NL; ST=UTRECHT; O=Accenture; CN=192.168.42.1
* start date: Jun 5 10:32:42 2019 GMT
* expire date: Oct 17 10:32:42 2020 GMT
* issuer: C=NL; ST=UTRECHT; O=Accenture; CN=rootca
* SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* SSL_write() returned SYSCALL, errno = 104
* Failed sending HTTP request
* Connection #0 to host 127.0.0.1 left intact
# Pistache output
139776253719744:error:1417C0C7:SSL routines:tls_process_client_certificate:peer did not return a certificate:../ssl/statem/statem_srvr.c:3676
Running it multiple times without server restart just keeps on printing the Pistache output above.
I've gotten around to testing this myself now, I can confirm that this bug is present, stack trace below.
It looks lie whatever
futex(0x56427dbdbcb8, FUTEX_WAKE_PRIVATE, 1)
does isn't successful on the second request. @knowledge4igor @Ne02ptzero any ideas? I haven't worked directly with openssl before.First request:
epoll_wait(3, [{EPOLLIN, {u32=4, u64=4}}], 1024, -1) = 1 accept(4, {sa_family=AF_INET, sin_port=htons(48244), sin_addr=inet_addr("127.0.0.1")}, [16]) = 18 read(18, "\26\3\1\2\0", 5) = 5 read(18, "\1\0\1\374\3\3\177\321\345\f\246\273\306\31\357\2148\1ZFW\241d\325\312\243\350\23\246x\360\221"..., 512) = 512 getpid() = 30057 getpid() = 30057 getpid() = 30057 getpid() = 30057 write(18, "\26\3\3\0Y\2\0\0U\3\3\206\3033\326\22YH\245\306V\22g76\310e\21 e\342\235"..., 1528) = 1528 read(18, "\26\3\3\0F", 5) = 5 read(18, "\20\0\0BA\4u\3478\247\200\255\273\213\245^\3\362\347\276!\3469S\220\364\363\210\274\203\213N"..., 70) = 70 read(18, "\24\3\3\0\1", 5) = 5 read(18, "\1", 1) = 1 read(18, "\26\3\3\0(", 5) = 5 read(18, "~cx\372\34\335\207O\300$a\312L=\255N\206\324&\336!\271a\334Zl_MV\315}\r"..., 40) = 40 getpid() = 30057 write(18, "\24\3\3\0\1\1\26\3\3\0(\310\336\227\337\355\251^B\255M\314\337P\n\242\177M\324|5\373"..., 51) = 51 fcntl(18, F_GETFL) = 0x2 (flags O_RDWR) fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0 write(11, "\1\0\0\0\0\0\0\0", 8) = 8 futex(0x56427dbdbcb8, FUTEX_WAKE_PRIVATE, 1) = 1
Second request:
epoll_wait(3, [{EPOLLIN, {u32=4, u64=4}}], 1024, -1) = 1 accept(4, {sa_family=AF_INET, sin_port=htons(48246), sin_addr=inet_addr("127.0.0.1")}, [16]) = 18 read(18, "\26\3\1\2\0", 5) = 5 read(18, "\1\0\1\374\3\3\1\356\331\1\2245\276o\235\232\301\177:I2\223!t\252}*\232\224l$<"..., 512) = 512 getpid() = 30057 getpid() = 30057 getpid() = 30057 write(18, "\26\3\3\0Y\2\0\0U\3\3l`)\261k\2736\335\330\331\241\362;F\5\332(-\257x\324"..., 1528) = 1528 read(18, "\26\3\3\0F", 5) = 5 read(18, "\20\0\0BA\4*\rs\323\313\207]\323\260Ac\0.\353q}\22\213\314\236y\25a\237<g"..., 70) = 70 read(18, "\24\3\3\0\1", 5) = 5 read(18, "\1", 1) = 1 read(18, "\26\3\3\0(", 5) = 5 read(18, "B\16\245\312gi\32\243\372\342\5\2662\3\212z\214\330\27\257\244\244\1\374\3471v;\225j\243\354"..., 40) = 40 getpid() = 30057 write(18, "\24\3\3\0\1\1\26\3\3\0(\224\276N=yw\17sH!\32\260\332\343\317|DM\301\326l"..., 51) = 51 fcntl(18, F_GETFL) = 0x2 (flags O_RDWR) fcntl(18, F_SETFL, O_RDWR|O_NONBLOCK) = 0 write(11, "\1\0\0\0\0\0\0\0", 8) = 8 epoll_wait(3, terminate called after throwing an instance of 'std::runtime_error' what(): Success <unfinished ...>) = ? +++ killed by SIGABRT +++ Aborted
It actually looks like the error is coming from the write()
function. Otherwise the stack trace should have shown the call to futex()
in the second stack trace, shouldn't it?
It shows an output write(11, "\1\0\0\0\0\0\0\0", 8) = 8
, if I understand it correctly the =8
means that it's successfully written the 8 bytes.
That's an interesting one.
I am reproducing with curl
and your scripts (Thanks for that!), and I don't think the error is in pistache. I've added some debug on errors, and I'm greeted with:
140383638009600:error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:ssl/record/ssl3_record.c:677:
Every two connections.
I cannot reproduce with firefox
nor openssl s_client
.
I'll continue digging into this.
It shows an output
write(11, "\1\0\0\0\0\0\0\0", 8) = 8
, if I understand it correctly the=8
means that it's successfully written the 8 bytes.
Maybe I am wrong. It is just a bit odd that the stack trace does not show the function call to futex()
.
It shows an output
write(11, "\1\0\0\0\0\0\0\0", 8) = 8
, if I understand it correctly the=8
means that it's successfully written the 8 bytes.Maybe I am wrong. It is just a bit odd that the stack trace does not show the function call to
futex()
.
Agreed, if I had to guess it looks like it can't find the next instruction (I.E. the futex
) as if the program space that's holding the openssl symbols is getting corrupted somehow.
We should probably test the build against different versions of openssl.
That's an interesting one.
I am reproducing with
curl
and your scripts (Thanks for that!), and I don't think the error is in pistache. I've added some debug on errors, and I'm greeted with:140383638009600:error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:ssl/record/ssl3_record.c:677:
Every two connections.
I cannot reproduce with
firefox
noropenssl s_client
. I'll continue digging into this.
Thanks, I don't have enough exposure to this stuff (Partly why I was uncomfortable with us including ssl/tls support)
That's an interesting one. I am reproducing with
curl
and your scripts (Thanks for that!), and I don't think the error is in pistache. I've added some debug on errors, and I'm greeted with:140383638009600:error:1408F119:SSL routines:ssl3_get_record:decryption failed or bad record mac:ssl/record/ssl3_record.c:677:
Every two connections. I cannot reproduce with
firefox
noropenssl s_client
. I'll continue digging into this.Thanks, I don't have enough exposure to this stuff (Partly why I was uncomfortable with us including ssl/tls support)
Could a fix then be to use a different openssl version? In that case I could figure out with what version it does work and post the version number here.
That would be a weak fix, in general we prefer to find the root cause.
Hey guys, I am new user of Pistache lib, so I am not much familiar with its code, but it seems to me that this issue was introduced by PR #462 . We try to close fd in putOnWire (http.cc) which would be fine, but ssl is being involved.
Transport itself handles peer disconnection (handlePeerDisconnection) that clears SSL connection and removes this peer from the map and also clears the buffers. But this is not available (no access to peers map, only to peer itself) from the thread where we try to close the fd leaving the peers map untouched with the same SSL connection for that fd. On second request we get new connection (most likely the same fd). Upon insert of this peer to map happens nothing because this peer with this fd exists in this list. IMHO this causes Transport::handleIncoming to throw the runtime_error: Success.
Did not find appropriate solution (only workaround) yet.
Perhaps I am wrong. I just wanted to let you know so it might help you.
Wouldn't that only be an issue if a new instance of Transport is created in the new thread to attempt to remove the peer from the map?
In that case, two fixes would be possible:
Wouldn't that only be an issue if a new instance of Transport is created in the new thread to attempt to remove the peer from the map?
I guess, but currently we do not attempt to remove the peer from the map which I think we should in this case. Your first proposal to provide Transport object by reference sounds good.
Wouldn't that only be an issue if a new instance of Transport is created in the new thread to attempt to remove the peer from the map?
I guess, but currently we do not attempt to remove the peer from the map which I think we should in this case. Your first proposal to provide Transport object by reference sounds good.
I think on line 189 of transport.cc they are trying to remove it using the fd number of the peer. This might mean it is indeed a copying issue somewhere.
The copy constructor is explicitly deleted, but there is a clone function, which i assume does something similar. Although it seems to only clone the handler and nothing else.
I'm using Pistache head from the unstable PPA. I'm experiencing this same issue as well:
OpenSSL SSL_read: error:1408F10B:SSL routines:ssl3_get_record:wrong version number, errno 0 * Closing connection 0
I've noticed curl emits this only when I try to get my server to use an Endpoint::serveFile() code path. If I try to access endpoints that just retrieve some small JSON response there is no problem.
I might have found an issue with the Pistache https/ssl support. I used the following code:
The certificates where generated using the terminal script below. WARNING: The certificates generated by the terminal script might not be fully save. I am no expert in SSL and I chose to remove the password from the certificate generation. Make sure you check if the certificates are save before you use them.
And then I ran the command line curl request
curl https://127.0.0.1:9999 --cacert rootCA/rootCA.crt --cert client/client.crt --key client/client.key -k -v
I performed two curl requests using the same line without restarting the server. The first provides me with status code OK and the hello world text. The second one gives me an error:
Could someone either confirm this is an issue or tell me what I did wrong. Another bit of information is that the second request does not error in the handler. It errors before the handler is called.
Stack trace: