openssl / openssl

TLS/SSL and crypto library
https://www.openssl.org
Apache License 2.0
25.69k stars 10.11k forks source link

SSL error differs for SSL_write()/SSL_sendfile() in broken pipe scenario. #23722

Closed sergiusz-n closed 5 months ago

sergiusz-n commented 7 months ago

After upgrading from openssl1.0.x in the application implementing some HTTPS server, we attempted to substitiute SSL_write() calls in favour of SSL_sendfile() . Skipping the details the transition went smoothly, however, some discrepancy was noticed in the discrepancy in a few failure scenarios. For instance, TCP connection drop by the peer scenario while doing write operation

would it be better for SSL_sendfile to raise SSL_ERROR_SYSCALL error instead, considering the fact they mostly come out s sendfile() system call. It was confusing (to say the least) to see error:0A000114:SSL routines::uninitialized for the session, where previous writes were successful. The error handing is quite poor here in ssl_lib.c, only EAGAIN/EBUSY/EINTR seem to be differentiated. Please advise.

Definition of Done

### Tasks
- [x] Encode SSL_sendfile to raise SSL_ERROR_SYSCALL when epipe is returned from sendfile syscall
nhorman commented 7 months ago

Can you test with this PR please: https://github.com/openssl/openssl/pull/23723 I think it will fix the discrepancy

nhorman commented 7 months ago

was the issue corrected with the patch?

sergiusz-n commented 7 months ago

@nhorman thanks for the quick response. the code changes seem to be promising. it will take a couple of days to verify it in practice. I'll get back to you once I am done.

jay29387432 commented 7 months ago

There's also a breaking change between 3.1 and 3.2 with KTLS, nonblocking and SSL_write returning SSL_EROR_SSL and errno EAGAIN. Version 3.1 didnt have this behaviour. It looks like it returned SSL_ERROR_WANT_WRITE directly.

Sidenote: The captcha for registering an account with github is insane. I dont want to spend 5 minutes clicking on dart-boards.

sergiusz-n commented 7 months ago

was the issue corrected with the patch?

@nhorman the patch has been tested against EPIPE failure scenario. Although the SSL_get_error() is become SSL_ERROR_SYSCALL with the patch applied, there is a tiny discrepancy remained.

sergiusz-n commented 7 months ago

I have just grep'd throughout the code and found that the pattern:

ERR_raise_data(ERR_LIB_SYS, get_last_socket_error(),
      "calling some_socket_system_function()"); 

is widely used in crypto/bio/*.c files for the most of socket operations like ioctl(), getsockopt(), accept(), connect() etc, but never met when doing write/read. was it done intentionally or omitted?

nhorman commented 7 months ago

Thats odd, I made no changes to SSL_write, but you noted in your initial post that an error on SSL_write SSL_get_error returns SSL_ERROR_SYSCALL, which is retrieved based on a call to ERR_peek_error(), so its not making much sense to me that now its not returning an error at all ?

sergiusz-n commented 7 months ago

@nhorman let me rephrase it and try to put it clearly to avoid possible confusion

Considering the same broken pipe failure scenario, the behaviour is as follows: openssl-3.2.1 baseline:

openssl-3.2.1 with patch applied:

I would expect SSL_sendfile() to resemble SSL_write() behaviour in that regard, that's why I reported "tiny discrepancy remained". I hope now it has been stated clearly.

nhorman commented 7 months ago

ok, thank you for the clarification, so in both cases ERR_peek_error on the SSL_write case, regardless of the patch status returns 0 for you. That makes more sense.

Though, its still confusing. SSL_get_error calls ossl_ssl_get_error to determine the error return code, and SSL_ERROR_SYSCALL is returned if and only if the following conditions are met: 1) the passed in check_err parameter is 1 (which it is) 2) ERR_peek_error does not return zero 3) The ERR_GET_LIB value of the value returned from ERR_peek_error is ERR_LIB_SYS

That suggests to me that the error stack is being cleared between the time you call SSL_get_error and the time you call ERR_peek_error.

do you happen to have a small reproducer I can use to try this here?

sergiusz-n commented 7 months ago

@nhorman

SSL_ERROR_SYSCALL is returned if and only if the following conditions are met:

  1. the passed in check_err parameter is 1 (which it is)
  2. ERR_peek_error does not return zero
  3. The ERR_GET_LIB value of the value returned from ERR_peek_error is ERR_LIB_SYS

it seems not to be a true statement. there are other possibilities for ossl_ssl_get_error () to return _SSL_ERRORSYSCALL. What I believe is happening when SSL_write() encounters EPIPE error, ERR_raise_data() is not invoked, ERR_peek_error() thereby is zero, and SSL_ERROR_SYSCALL comes out from the very last return of this function: ssl_lib.c:4703.

As I mentioned when handling negative retvalue during write socket operation, only check against non-fatal errno values is made, and no error raised Please refer to int sock_write(BIO *b, const char *in, int inl) implementation.

And you may compare how socket connect() returning -1 is handled here. (i.e. the error is raised unless errno is non-fatal, retriable.)

sergiusz-n commented 7 months ago

@nhorman

I have come up with some artificial failure snippet based on sslecho client/server sample from the demos folder in the repo. Please find it here.

nhorman commented 7 months ago

thank you, I see whats happening now. Let me see what I can do here

nhorman commented 7 months ago

updated the PR to take care of this