spring-cloud / spring-cloud-vault

Configuration Integration with HashiCorp Vault
http://cloud.spring.io/spring-cloud-vault/
Apache License 2.0
270 stars 151 forks source link

Spring Cloud Vault Config - SocketTimeoutException #659

Closed spectateur closed 1 year ago

spectateur commented 1 year ago

Hello,

We are using spring cloud vault with spring boot application hosted in a cloud provider, our hashicorp vault v1.11.2 is on premise.

<spring-boot.version>2.3.12.RELEASE</spring-boot.version>

<groupId>org.springframework.cloud</groupId><artifactId>spring-cloud-starter-vault-config</artifactId><version>2.2.7.RELEASE</version>

We are experiencing randomly, timeouts during renew (in our case 48H lease) or authentication with the vault.

Error detail :


Sep 18 17:47:08 pod-name-back WARN   |  |  |  o.s.v.c.l.SecretLeaseEventPublisher$LoggingErrorListener - [RequestedSecret [path='database/postgres/instance01/creds/own_instance01', mode=RENEW]] Lease [leaseId='database/postgres/instance01/creds/own_instance01/bPz8jG8SmDJXYt36f5ZVVQ3x.BXpj9', leaseDuration=PT48H, renewable=true] Cannot renew lease; nested exception is org.springframework.web.client.ResourceAccessException: I/O error on PUT request for "https://hvault.mysite.com:443/v1/sys/renew": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
org.springframework.vault.VaultException: Cannot renew lease; nested exception is org.springframework.web.client.ResourceAccessException: I/O error on PUT request for "https://hvault.mysite.com:443/v1/sys/renew": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
    at org.springframework.vault.core.lease.SecretLeaseContainer.doRenewLease(SecretLeaseContainer.java:720)
    at org.springframework.vault.core.lease.SecretLeaseContainer.renewAndSchedule(SecretLeaseContainer.java:589)
    at org.springframework.vault.core.lease.SecretLeaseContainer.lambda$scheduleLeaseRenewal$0(SecretLeaseContainer.java:581)
    at org.springframework.vault.core.lease.SecretLeaseContainer$LeaseRenewalScheduler$1.run(SecretLeaseContainer.java:891)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)
Caused by: org.springframework.web.client.ResourceAccessException: I/O error on PUT request for "https://hvault.mysite.com:443/v1/sys/renew": Read timed out; nested exception is java.net.SocketTimeoutException: Read timed out
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:746)
    at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:672)
    at org.springframework.web.client.RestTemplate.exchange(RestTemplate.java:581)
    at org.springframework.vault.core.lease.LeaseEndpoints$1.renew(LeaseEndpoints.java:59)
    at org.springframework.vault.core.lease.SecretLeaseContainer.lambda$doRenew$2(SecretLeaseContainer.java:752)
    at org.springframework.vault.core.VaultTemplate.doWithSession(VaultTemplate.java:388)
    at org.springframework.vault.core.lease.SecretLeaseContainer.doRenew(SecretLeaseContainer.java:751)
    at org.springframework.vault.core.lease.SecretLeaseContainer.doRenewLease(SecretLeaseContainer.java:688)
    ... 12 common frames omitted
Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:464)
    at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
    at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1346)
    at sun.security.ssl.SSLSocketImpl.access$300(SSLSocketImpl.java:73)
    at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:962)
    at org.apache.http.impl.conn.LoggingInputStream.read(LoggingInputStream.java:84)
    at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
    at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
    at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
    at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
    at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
    at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
    at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
    at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
    at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
    at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
    at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
    at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
    at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
    at org.springframework.http.client.HttpComponentsClientHttpRequest.executeInternal(HttpComponentsClientHttpRequest.java:87)
    at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
    at org.springframework.http.client.InterceptingClientHttpRequest$InterceptingRequestExecution.execute(InterceptingClientHttpRequest.java:109)
    at org.springframework.vault.core.VaultTemplate.lambda$getSessionInterceptor$1(VaultTemplate.java:200)
    at org.springframework.http.client.InterceptingClientHttpRequest$InterceptingRequestExecution.execute(InterceptingClientHttpRequest.java:93)
    at org.springframework.vault.client.VaultClients.lambda$createRestTemplate$0(VaultClients.java:128)
    at org.springframework.http.client.InterceptingClientHttpRequest$InterceptingRequestExecution.execute(InterceptingClientHttpRequest.java:93)
    at org.springframework.http.client.InterceptingClientHttpRequest.executeInternal(InterceptingClientHttpRequest.java:77)
    at org.springframework.http.client.AbstractBufferingClientHttpRequest.executeInternal(AbstractBufferingClientHttpRequest.java:48)
    at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:53)
    at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:737)
    ... 19 common frames omitted

After digging into communications between our vault server and the clients we have seen that spring cloud vault framework reuse TLS sessions and TCP sessions that have been created with the previous renew.

From vault perspective these sessions have been closed accordingly to the vault timeout as vault server didn't receive anymore communications from the client. All network equipments, worker nodes also dropped the original session in kernel table after grace period (TCP end timeout) triggered by receiving FIN from vault server.

So in the following capture you can see a timeout event on 2022-09-18 17:47:08 that triggered a close_notify from client (see frame N°2741), and which is 15sec after the renew request (read-timeout = 15000ms),

image

The port used during the timeout is the same than the one used for the previous renew 48H before (see frame N° 2725)/

image

Note that the client does not ACK vault close_notify for TLS / TCP FIN, (frame N°1985 to 1988) but we are not sure that this could explain that the client reuse the same TLS sessions days after, we tested multiple value for RENEW lease, timeout in the spring cloud vault framework without being able to explain why the vault client do not close the session right after the renewal has been completed.

As the vault client is reusing a session that is no longer existing, the worker node initiate a new port (the sequence number 1 in frame N°2726) and because this new TCP session isn't SYN on next hop network equipment, it is dropped with "first packet isn't SYN error message".

Even if it was SYN we would end up reaching the vault with the original port that was FIN 48H ago by the vault server itself.

So here we are exploring ways to fix this in the spring cloud vault framework through a git issue.

Thanks for your help,

mp911de commented 1 year ago

There's nothing really we could do from our side as our client is built on top of an abstraction for interchangeable HTTP clients (OkHttp, Java URL Connection, Apache HTTP Client). I would suggest reaching out to the Apache HTTP Client project with your observation and asking for guidance there.

spectateur commented 1 year ago

I created an issue on Apache Jira https://issues.apache.org/jira/browse/HTTPCLIENT-2235

spectateur commented 1 year ago

https://github.com/spring-cloud/spring-cloud-vault/issues/660