gridcf / gct

Grid Community Toolkit
Apache License 2.0
46 stars 30 forks source link

Don't assume that TLS 1.3 returns exactly two session tickets #196

Closed chrisburr closed 2 years ago

chrisburr commented 2 years ago

Fixes https://github.com/gridcf/gct/issues/195

fscheiner commented 2 years ago

@chrisburr: Say, do you have the possibility to test this against a dCache instance (e.g. https://confluence.desy.de/pages/viewpage.action?pageId=228761933), specifically the GridFTP client (globus-url-copy) because of https://github.com/gridcf/gct/issues/174 and https://github.com/gridcf/gct/pull/177?

CCing @paulmillar.

chrisburr commented 2 years ago

I already did and globus-url-copy -list gsiftp://prometheus.desy.de/~/ works with this PR. (I can also reproduce the failure with gct=6.2.1629922860.)

For the failing gss-assist-auth-test.pl test. I'm struggling to reproduce a sucessful run locally with the current contents of master. Is there a prerequisite that I might be missing?

I suspect a fix similar to dmc/cgsi-gsoap!4 is needed but it's hard to figure out where without reproducing it sucessfully.

fscheiner commented 2 years ago

I already did and globus-url-copy -list gsiftp://prometheus.desy.de/~/ works with this PR. (I can also reproduce the failure with gct=6.2.1629922860.)

OK, that's good then. We should also check that it still works against the GCT (both older versions and with this PR).

For the failing gss-assist-auth-test.pl test. I'm struggling to reproduce a sucessful run locally with the current contents of master. Is there a prerequisite that I might be missing?

That's definitely a new error, because recent tests worked through if I'm not mistaken - e.g. https://github.com/gridcf/gct/runs/6371618133?check_suite_focus=true was still OK. Maybe something changed in the dependencies provided by RockyLinux or EPEL in an incompatible way. I'll have a look into this tomorrow.

fscheiner commented 2 years ago

I saved the logs and re-initiated the job just now to see if it maybe was something intermittent.

chrisburr commented 2 years ago

I think I've figured out why I couldn't test locally so hopefully I can debug the failing test at some point today.

paulmillar commented 2 years ago

@chrisburr, @fscheiner

I guess you don't need me to do anything to support your testing this patch against prometheus.desy.de, right?

ellert commented 2 years ago

If we are lowering the number of tickets, why not lower it to 0?

The man page says:

HISTORY SSL_new_session_ticket() was added in OpenSSL 3.0.0. SSL_set_num_tickets(), SSL_get_num_tickets(), SSL_CTX_set_num_tickets(), and SSL_CTX_get_num_tickets() were added in OpenSSL 1.1.1.

We still need to support 1.0.2k in RHEL 7, so the call to SSL_CTX_set_num_tickets must be set inside

if OPENSSL_VERSION_NUMBER >= 0x10101000L

endif

ellert commented 2 years ago

The centos 7 builds are failing with: /gct/gsi/gssapi/source/library/.libs/libglobus_gssapi_gsi.so: undefined reference to `SSL_CTX_set_num_tickets'

ellert commented 2 years ago

Also the lines:

#if defined(TLS1_3_VERSION)
    unsigned char *                     tbuf = NULL;
    size_t                              tlen = 0;
#endif

should be removed, sine these are not used any more.

chrisburr commented 2 years ago

I guess you don't need me to do anything to support your testing this patch against prometheus.desy.de, right?

Yes, I don't need anything from your side.

If we are lowering the number of tickets, why not lower it to 0?

This is probably the correct thing to do, though I would still like figure out how to fix the code to survive when tickets are sent. I think I'm close to this.

We still need to support 1.0.2k in RHEL 7, so the call to SSL_CTX_set_num_tickets must be set inside

🤦

fscheiner commented 2 years ago

@paulmillar

I guess you don't need me to do anything to support your testing this patch against prometheus.desy.de, right?

Indeed, I wasn't aware that Chris already had access there, so I thought about getting you in the loop from the start, just in case.

ellert commented 2 years ago

I would still like figure out how to fix the code to survive when tickets are sent.

In normal circumstances the original code without any TLS 1.3 tweaks just works and you don't need anything special for TLS 1.3. When the communication is via a network connection managed by openssl the tickets are handled seamlessly, and the original code that worked for TLS 1.2 also works for TLS 1.3 without modification. The problem is when the handshake is done over the file descriptor or socket based communication in globus-gss-assist, which just sends one token at a time to the SSL handshake function. In this case the handshake function is not aware that there will be tickets sent in the following tokens and declares that the handshake is complete before the tokens are accepted.

The special handling was only intended to be triggered in this case, not for a normal handshake over network. On a normal network handshake any number of tickets should work and the special TOKEN states should not be entered. If you get into the TOKEN states on a network connection, something not foreseen has happened. What client/server combination did this happen for?

ellert commented 2 years ago

The retry in case of a local_error != GLOBUS_SUCCESS can lead to a hung server. If the error is not due to a TOKEN being sent, but properly due to that the certificate signing request could not be parsed, then the server will try to get another request from the connection, but the client is not sending any because it already sent it, and the server will hang. If you set the number of tokens to 0, you should never get any tokens here, and you can remove the retry.

chrisburr commented 2 years ago

If you get into the TOKEN states on a network connection, something not foreseen has happened. What client/server combination did this happen for?

$ export GLOBUS_GSSAPI_MAX_TLS_PROTOCOL=TLS1_3_VERSION GLOBUS_GSI_OPENSSL_ERROR_DEBUG_LEVEL=99 GLOBUS_GSSAPI_DEBUG_LEVEL=99 GLOBUS_GSI_GSS_ASSIST_DEBUG_LEVEL=99
$ gfal-ls -vvv 'srm://prometheus.desy.de:8445' 2>&1 | grep -E '(SSL handshake finished|Using TLS|gss_state)'
init_sec_context:major_status:00000001:gss_state:0 req_flags=00000032:ret_flags=00000000
init_sec_context:major_status:00000001:gss_state:0 req_flags=00000032:ret_flags=00000000
init_sec_context:major_status:00000001:gss_state:0 req_flags=00000032:ret_flags=00000000
SSL handshake finished
Using TLSv1.3.
init_sec_context:major_status:00000001:gss_state:1 req_flags=00000032:ret_flags=000000ff
init_sec_context:major_status:00000001:gss_state:2 req_flags=00000032:ret_flags=000000ff

It then hangs until a timeout is hit.

After removing the TOKEN1/TOKEN2 states SRM shows an error when it sees the NewSessionTicket. The only solution I could see for fixing that is this PR (best viewed with "Show whitespace changes" disabled).

chrisburr commented 2 years ago

If you set the number of tokens to 0, you should never get any tokens here, and you can remove the retry.

I think I agree with you. Just in case we end up going back to it, this was my thinking of how to supress and retry in a nicer way:

I was thinking of moving this retry logic to only apply if this condition fails, though I wasn't sure how best to propogate this information back to gss_init_sec_context, perhaps a custom retry error code?

https://github.com/gridcf/gct/blob/538dbfd428f460040c506bddcf8a8dc35a059240/gsi/proxy/proxy_core/source/library/globus_gsi_proxy.c#L714-L722

ellert commented 2 years ago

This looks good now. Will you drop the draft status?

chrisburr commented 2 years ago

I was just wanting to run another set of checks from a fresh build to make sure I hadn't broken something but this seems fine now:

gfal-ls http://prometheus.desy.de/ exited with 0
gfal-ls https://prometheus.desy.de/ exited with 0
gfal-ls https://prometheus.desy.de:2443/ exited with 0
gfal-ls gsiftp://prometheus.desy.de/ exited with 0
gfal-ls xroot://prometheus.desy.de:1094/ exited with 0
gfal-ls xroot://prometheus.desy.de:1095/ exited with 0
gfal-ls srm://prometheus.desy.de:8443/ exited with 0
gfal-ls srm://prometheus.desy.de:8445/ exited with 0