Closed pditommaso closed 1 year ago
This is becoming an increasing concern since more customers are reporting it.
The error is reported by the Wave client implemented by Nextflow (see also here), however, it's not clear if it's a client or backend problem.
The error looks related to a JDK 17 issue. The ticket reports it has been fixed by Java 17.0.7, but some customer are reporting even after the system was upgraded to that version of Java.
Worth mentioning we are using Amazon Corretto 17.0.7 and it's not clear if this distribution received that patch. Looking into the readme, it says it's built from https://github.com/openjdk/jdk17u, and I was not able to find any reference to that bug
Tagging @jimmypoms in this issue, please team up with @munishchouhan to try to replicate the problem and identify if it's a problem of the JVM. /cc @swampie
I checked Amazon Corretto repo and the JDK bug fix is present in 17.0.7.7.1 https://github.com/corretto/corretto-17/commit/a800231f17a5081de7f2336742faa97b11f12b3b
Is that patch also included in this container image quay.io/seqeralabs/nf-launcher:j17-23.04.1
? (this is the one the customers use to run nextflow)
no its using version=17.0.6.10-1
quay.io/seqeralabs/nf-launcher:j17-23.04.2 contains Amazon corretto 17.0.7.7.1 we can recommend the latest version https://quay.io/repository/seqeralabs/nf-launcher?tab=tags&tag=j17-23.04.2-up1
My fail, customers are using quay.io/seqeralabs/nf-launcher:j17-23.04.2.
How do you know it contains that patch for httpclient issue?
because the Java version mentioned in its manifest is 17.0.7.7-1
Following the referenced issue and looking at the log attached there, the interesting bit is that this seems to happen at the /container-token
endpoint which, as far as I know, only interacts with tower.
Looking at the logs of wave for the past week, there's a couple of failures that come up for that endpoint (Request exceeded build rate limit, unauthorized, and some others) but I couldn't find anything really suspicious during the time of the error in this ticket (Jun-20 15:39:56.768). I have a suspicion that there might be some code path that doesn't complete the CF nor error. Could it be that the WS failed silently?
On a slightly related note, @pditommaso given that the logging system is taking shape and we have increasingly complex interactions between services, have we considered introducing a correlation id or similar to requests?
because the Java version mentioned in its manifest is 17.0.7.7-1
That's clear, but I'm not sure that patch landed into that distribution, or at least I could not find any evidence
given that the logging system is taking shape and we have increasingly complex interactions between services, have we considered introducing a correlation id or similar to requests?
That makes a lot of sense. What's the best practices? any suggestion?
That makes a lot of sense. What's the best practices? any suggestion?
Back in the day, we added a custom filter to our services that took the value from a header if present or created a new id. It wasn't a particularly hard thing to do but special care needed to be taken to propagate the id across threads and adding it to all client interactions. There might be a case for adapting and extending micronaut-tracing as it solves the same problem with propagating the context. I'm happy to discuss approaches outside of the scope of this issue :)
As for the reported issue, I think we would need more cases of it happening and maybe add some further logs in wave. It would be interesting to know whether the issue is isolated to the /container-token
endpoint or affects all endpoints in wave.
I had seen micronaut tracing, however, it looks like it requires the use of yet another tool such as Zipkin or Jaeger to the deployed and managed. Is it worth using any of those? do you have any experience with them?
Zipkin or Jaeger is used for distributed tracing, which can be quite useful if the setup is microservice heavy for getting better visibility on which services are involved in a request and how long each part of the code takes (spans and whatnot).
For me personally, I never really had a situation where I saw a lot of benefit from the distributed traces when the logging system was properly setup with correlation ids (or trace ids if you will). The main reason I was mentioning it was because of the context propagation of the trace ids, which in a Netty environment can be a bit tricky. Using the ids from tracing for logs is a natural extension, e.g. OpenTelemetry uses trace ids and span ids for log correlation as well :). The TL;DR is I don't think Zipkin or Jaeger is a hard requirement, but I'm not familiar enough with the implementation to say whether micronaut tracing is required for the log correlation to work.
Moving on this thread, I agree Zipkin or Jaeger and overkill at least at this stage.
It makes a lot of sense we just add a unique trace ID in each request. How wonder however if there's a nice to make Grafana aware of it, so that it's rendered in the context of the log
tagging @ashseqera
I'm no expert in Loki, but using structured logging (either json or any other format) we should be able to add filterable information to every log line without the need to add them as labels. Something like {app=~"wave-app|tower|..."} |= "<traceId>"
. With a small enough time range the performance should be reasonable as well.
In previous setups, we added links to the logging system with reasonable queries and time ranges to the alerts via prometheus's templating. It is not the most intuitive thing to do, but it worked quite nicely.
I hope I understood your wonders correctly :)
We have more customers reporting this issue. The weird thing is that it looks like the problem persists even after several retries. Have a look at this log
Jul-21 16:46:56.648 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave request: https://wave.seqera.io/container-token; attempt=1 - request: SubmitContainerTokenRequest(towerAccessToken:eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiI3MTM0IiwibmJmIjoxNjg5OTU2NjMxLCJyb2xlcyI6WyJ1c2VyIl0sImlzcyI6InRvd2VyLWFwcCIsImV4cCI6MTY4OTk2MDIzMSwiaWF0IjoxNjg5OTU2NjMxfQ.gvGEqrw_UAE5E9kiL_Z3kncfol7XlIApfSpRqWcx2NM, towerRefreshToken:eyJhbGciOiJIUzI1NiJ9.MzIyMmUwMDQtNmI0My00NTYxLThiZTYtMjNkYTJhMzM4OWVh.kbF3kvzyf9U6p1bl6mN9Czf8mrfbiNbgKIB1qbXYu8Q, towerWorkspaceId:152124791902294, towerEndpoint:https://api.tower.nf, workflowId:3lnmizDUUT1IFN, containerImage:quay.io/biocontainers/samtools:1.17--hd87286a_1, containerFile:null, containerConfig:ContainerConfig(entrypoint:[/usr/bin/fusion], cmd:null, env:null, workingDir:null, layers:[ContainerLayer[location=data:H4sIAAAAAAAA/+19/3/a...; tarDigest=sha256:86acae9631a0b7837ff24f5e7788895fefc371b92812e9af517bd16140707dbb; gzipDigest=sha256:84a80a1980647a25501bfb17365ac704e662dab6773c24af84cfb8b5ff50c484; gzipSize=16814], ContainerLayer[location=https://fusionfs.seqera.io/releases/pkg/2/2/5/fusion-amd64.tar.gz; tarDigest=sha256:dd2bdf431d905025443cf89bf7ac15f1d3deb94d0a5333373f23adbc02fe0035; gzipDigest=sha256:d00a4281864b3252d86c6d9edf131dbea6feb4e6e80264dc160aee2745fbe580; gzipSize=15112427]]), condaFile:null, spackFile:null, containerPlatform:linux/amd64, buildRepository:null, cacheRepository:null, timestamp:2023-07-21T16:46:56.648279294Z, fingerprint:d14debb7001bf14e51f6a52aceb2f7c3)
Jul-21 16:46:56.806 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 1
java.io.IOException: HTTP/1.1 header parser received no bytes
Jul-21 16:46:57.112 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 2
java.io.IOException: HTTP/1.1 header parser received no bytes
Jul-21 16:46:57.738 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 3
java.io.IOException: HTTP/1.1 header parser received no bytes
Jul-21 16:46:58.707 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 4
java.io.IOException: HTTP/1.1 header parser received no bytes
Looking more in the cause of the exception it seems that's not always the same tho related
Jul-21 16:46:56.648 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave request: https://wave.seqera.io/container-token; attempt=1 - request: SubmitContainerTokenRequest(towerAccessToken:eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiI3MTM0IiwibmJmIjoxNjg5OTU2NjMxLCJyb2xlcyI6WyJ1c2VyIl0sImlzcyI6InRvd2VyLWFwcCIsImV4cCI6MTY4OTk2MDIzMSwiaWF0IjoxNjg5OTU2NjMxfQ.gvGEqrw_UAE5E9kiL_Z3kncfol7XlIApfSpRqWcx2NM, towerRefreshToken:eyJhbGciOiJIUzI1NiJ9.MzIyMmUwMDQtNmI0My00NTYxLThiZTYtMjNkYTJhMzM4OWVh.kbF3kvzyf9U6p1bl6mN9Czf8mrfbiNbgKIB1qbXYu8Q, towerWorkspaceId:152124791902294, towerEndpoint:https://api.tower.nf, workflowId:3lnmizDUUT1IFN, containerImage:quay.io/biocontainers/samtools:1.17--hd87286a_1, containerFile:null, containerConfig:ContainerConfig(entrypoint:[/usr/bin/fusion], cmd:null, env:null, workingDir:null, layers:[ContainerLayer[location=data:H4sIAAAAAAAA/+19/3/a...; tarDigest=sha256:86acae9631a0b7837ff24f5e7788895fefc371b92812e9af517bd16140707dbb; gzipDigest=sha256:84a80a1980647a25501bfb17365ac704e662dab6773c24af84cfb8b5ff50c484; gzipSize=16814], ContainerLayer[location=https://fusionfs.seqera.io/releases/pkg/2/2/5/fusion-amd64.tar.gz; tarDigest=sha256:dd2bdf431d905025443cf89bf7ac15f1d3deb94d0a5333373f23adbc02fe0035; gzipDigest=sha256:d00a4281864b3252d86c6d9edf131dbea6feb4e6e80264dc160aee2745fbe580; gzipSize=15112427]]), condaFile:null, spackFile:null, containerPlatform:linux/amd64, buildRepository:null, cacheRepository:null, timestamp:2023-07-21T16:46:56.648279294Z, fingerprint:d14debb7001bf14e51f6a52aceb2f7c3)
Jul-21 16:46:56.806 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 1
java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:586)
at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:123)
...
Caused by: java.io.IOException: Connection reset by peer
at java.base/sun.nio.ch.FileDispatcherImpl.writev0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:66)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:217)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:153)
at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:563)
at java.base/java.nio.channels.SocketChannel.write(SocketChannel.java:642)
at java.net.http/jdk.internal.net.http.SocketTube.writeAvailable(SocketTube.java:1236)
at java.net.http/jdk.internal.net.http.SocketTube$InternalWriteSubscriber.tryFlushCurrent(SocketTube.java:350)
... 154 common frames omitted
Jul-21 16:46:57.112 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 2
java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:586)
at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:123)
...
Caused by: java.net.SocketException: Connection reset
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426)
at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1170)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:833)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:774)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:957)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:979)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:934)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:934)
Jul-21 16:46:57.738 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 3
java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:586)
at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:123)
...
Caused by: java.net.SocketException: Connection reset
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426)
at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1170)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:833)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:774)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:957)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:979)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:934)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:934)
Jul-21 16:46:58.707 [Actor Thread 93] DEBUG io.seqera.wave.plugin.WaveClient - Wave connection failure - attempt: 4
java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:586)
at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:123)
...
Caused by: java.io.EOFException: EOF reached while reading
... 21 common frames omitted
The first time says Connection reset by peer
, then twice Connection reset
and finally EOF reached while reading
This makes me think there's some instability on the backend side. @munishchouhan @jimmypoms thoughts?
I wonder if the lack of @ExecuteOn(TaskExecutors.IO)
could be related to this problem.
https://docs.micronaut.io/latest/guide/#atBlocking https://docs.micronaut.io/latest/api/io/micronaut/scheduling/annotation/ExecuteOn.html
We can do load testing to check if it's failing because of the lack of threads using tools like JMeter or Load runner We can target staging, or if there is a dev environment which is connected to tower
The stage environment can be used for that wave.stage-tower.net
. You may want to coordinate with @jimmypoms
Note, I've added @ExecuteOn IO thread pool e0c95a02
Another point to clarify is that @Cacheable
applied to a CompletableFuture
is thread-safe
To run the load testing, I need the following information:
How can I create a valid request to /container-token',?
That's the request made by nextflow to get the container, you can see in the nextflow log
Jul-20 15:23:24.594 [Actor Thread 5] DEBUG io.seqera.wave.plugin.WaveClient - Wave request: https://wave.seqera.io/container-token; attempt=1 - request: SubmitContainerTokenRequest(towerAccessToken:eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiI3IiwibmJmIjoxNjg5ODY2NDIzLCJyb2xlcyI6WyJ1c2VyIl0sImlzcyI6InRvd2VyLWFwcCIsImV4cCI6MTY4OTg3MDAyMywiaWF0IjoxNjg5ODY2NDIzfQ.mtw-FjLSI4ReDb_qTwXNwlDl7ikNVym6wSOyebaB5Hs, towerRefreshToken:eyJhbGciOiJIUzI1NiJ9.YjMxOWYzYzktMmNmOS00MjI4LTg3MWItN2JhMmNiNjMwMmRh.dgqigV9KUuDQTojSL7K7I7Lfc7UVXSe9uPeQ-fZLkmc, towerWorkspaceId:263146567680457, towerEndpoint:https://tower.modernatx.net/api, workflowId:26EmPAhz4e97RI, containerImage:quay.io/biocontainers/python:3.8.3, containerFile:null, containerConfig:ContainerConfig(entrypoint:[/usr/bin/fusion], cmd:null, env:null, workingDir:null, layers:[ContainerLayer[location=data:H4sIAAAAAAAA/+1a63Pb...; tarDigest=sha256:8875ce90433dadeea031f06955d2210ad2bc044d002e85f3f193a8f51300eb18; gzipDigest=sha256:8a4b3def6aa92a12a077f085de90dcae7ba9dc971f17c4ce117b2cb029d3249a; gzipSize=3388], ContainerLayer[location=https://fusionfs.seqera.io/releases/pkg/2/1/14/fusion-amd64.tar.gz; tarDigest=sha256:b7f7e82e26b1b49337a5581c186ac327965559dbc668ff61f94abe24b5356c81; gzipDigest=sha256:dc2af1c65a55bbe0870b44ff5e322ca5937ba035232316a430c9e556b84d1f6c; gzipSize=14218941]]), condaFile:null, containerPlatform:null, buildRepository:null, cacheRepository:null, timestamp:2023-07-20T15:23:24.592881144Z, fingerprint:7c5901d337e3229c9e90f94c010b0aae)
You can emulate it also using, the command
nextflow plugins nf-wave:get-container <container>
or by using the wavelit CLI
Do we have access to the redis cluster that is used as the broker for the WebSocket handling?
I'm wondering whether we could quickly check if we have stale entries there. That would help to clarify whether there's something off in handling the request/response flow.
When that happens, this error is reported (and the entries are evicted in any case from redis)
There are a couple of events in the log but they do not seem raised by customer usage
I saw that piece, but as you know, the whole WebSocket session handling is a bit on the complex side :innocent: ... instead of relying on the logs of the behaviour we expect to happen, my suggestion here is to try to find evidence that gives us a better understanding of what might be going wrong, especially if it something as easy as looking at data at rest in tower cloud.
We are currently a little bit in the guessing game with this one after all :/
I know, in any case in redis you won't anything because messages are evicted after a while. I think the best thing to do is to try to replicate it as suggest by Munish
Not sure about that. We don't set a TTL on the list used as a queue. Neither do we set one on the mark, nor do we do anything with the heartbeat :shrug:
I have seen the error in Tower. I am unsure if it's relevant because it doesn't happen at the time of the above error.
On the client side i.e. nextflow what error are you getting instead?
On the client side i.e. nextflow what error are you getting instead?
I don't see in error again in my local tower, I will post the client once it happens again
But I have something interesting, the same IOException but in Tower this time:
command ran 500 times nextflow plugin nf-wave:get-container biocontainers/samtools
Tower Log
Jul-26 17:39:42.552 [ForkJoinPool.commonPool-worker-1] - ERROR io.seqera.wave.PairingClient - Unexpected error proxing request https://1a583bf0283a.ngrok.app/api/user-info GET -- cause: java.io.IOException: HTTP/1.1 header parser received no bytes
java.util.concurrent.CompletionException: java.io.IOException: HTTP/1.1 header parser received no bytes
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1152)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at java.net.http/jdk.internal.net.http.Http1Response.onReadError(Http1Response.java:554)
at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:676)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:302)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:268)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:348)
at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:675)
... 8 common frames omitted
Caused by: java.net.SocketException: Connection reset
at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:394)
at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:426)
at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1170)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:833)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:181)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:774)
at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:957)
at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:253)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:979)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:934)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:934)
Error in Nextflow:
WARN: Can't update history file: .nextflow/history
ERROR ~ Unexpected error on command: get-container - cause: java.net.http.HttpConnectTimeoutException: HTTP connect timed out
ERROR ~ Unexpected error on command: get-container - cause: Wave invalid response: [408] {"message":"Timeout error connecting to 'https://1a583bf0283a.ngrok.app/api' - HTTP status=408"}
No log in wave
io.seqera.wave.PairingClient is a websocket client
So the the error in question maybe related to Websocket connection
This is interesting. Don't think it's related to websocket connection. The websocket connection is used by Wave to execute remotely (ie. in your tower instance) the http requests required by it. Precisely here
You are using Java 17, right? it could be interesting to run tower with Corretto 20.0.2, to verify if it's really the issue mentioned here
I am using openjdk version "17.0.5" I will try with Corretto 20.0.2
I've restarted Wave on stage adding more detailed log level
There are no occurrences IOException in Tower with Corretto 20.0.2. I ran the test twice to confirm the same
No Exception in wave
only Nextflow is throwing the below error few times
ERROR ~ Unexpected error on command: get-container - cause: java.net.http.HttpConnectTimeoutException: HTTP connect timed out
Have you use Wave in the stage server? I don't anything in the logs
e-64d8f75d69-62gst wave-app 18:50:08.677 [default-nioEventLoopGroup-2-5] DEBUG i.s.w.s.p.socket.PairingWebSocket - Opening pairing session - endpoint: https://61030ed345a9.ngrok.app/api [sessionId: hENH4EpvfL7xMPL5tpGg7g==]
wave-64d8f75d69-62gst wave-app 18:50:08.678 [default-nioEventLoopGroup-2-5] DEBUG i.s.w.s.pairing.PairingServiceImpl - Pairing with service 'tower' at address https://61030ed345a9.ngrok.app/api - pairing id: 170885731970145 (key: 4eabad314c8900f5c3f1f649e6d45a63)
wave-64d8f75d69-62gst wave-app 19:05:01.956 [default-nioEventLoopGroup-2-2] DEBUG i.s.w.s.p.socket.PairingWebSocket - Opening pairing session - endpoint: https://api.stage-tower.net [sessionId: QY6rkT36hpMB/SJroxUyew==]
wave-64d8f75d69-62gst wave-app 19:05:04.368 [default-nioEventLoopGroup-2-4] DEBUG i.s.w.s.p.socket.PairingWebSocket - Closing pairing session - endpoint: https://api.stage-tower.net [sessionId: U9U23pxXkv1aud58tQjOGg==]
wave-64d8f75d69-62gst wave-app 19:21:21.271 [default-nioEventLoopGroup-2-5] DEBUG i.s.w.s.p.socket.PairingWebSocket - Closing pairing session - endpoint: https://61030ed345a9.ngrok.app/api [sessionId: hENH4EpvfL7xMPL5tpGg7g==]
wave-64d8f75d69-62gst wave-app 19:27:30.260 [default-nioEventLoopGroup-2-7] DEBUG i.s.w.s.p.socket.PairingWebSocket - Opening pairing session - endpoint: https://61030ed345a9.ngrok.app/api [sessionId: ODky5IUzB5otlTS2+m1qGQ==]
wave-64d8f75d69-62gst wave-app 19:55:58.310 [default-nioEventLoopGroup-2-7] DEBUG i.s.w.s.p.socket.PairingWebSocket - Closing pairing session - endpoint: https://61030ed345a9.ngrok.app/api [sessionId: ODky5IUzB5otlTS2+m1qGQ==]
Yes i am using wave stage server
There are no logs 🤔
pairing in tower logs:
Jul-26 21:27:24.957 [default-nioEventLoopGroup-8-2] DEBUG io.seqera.wave.PairingManager - Wave pairing scheduling re-connect
Jul-26 21:27:29.985 [scheduled-executor-thread-17] DEBUG io.seqera.wave.PairingManager - Wave pairing to https://wave.stage-tower.net/pairing/tower/token/bdf66cce1f8a3668289cc351dcf933daac53bf568d3eadfc67f90d7b994cbc4e?endpoint=https%3A%2F%2F61030ed345a9.ngrok.app%2Fapi
and this is my nextflow.config, where i amrunning the tests:
tower {
endpoint = 'https://61030ed345a9.ngrok.app/api'
enabled = true
}
wave {
endpoint = 'https://wave.stage-tower.net'
enabled = true
}
docker {
enabled = true
}
there is no logging in @Get('/container-token/{token}') endpoint https://github.com/seqeralabs/wave/blob/735b765e9841e0c0a06bd5bc9a331a3ebd986875/src/main/groovy/io/seqera/wave/controller/ContainerTokenController.groovy#L121
Can you upload the nextflow log file?
Now this is weird, there are no Nextflow log gets generated for
nextflow plugin nf-wave:get-container biocontainers/samtools
But I have console out:
nohup.log
I have seen the error in Tower. I am unsure if it's relevant because it doesn't happen at the time of the above error.
This error is in wave and not in tower.
nextflow -log .nextflow.log plugin nf-wave:get-container biocontainers/samtools
ERROR ~ Unexpected error on command: get-container - cause: Wave invalid response: [408] {"message":"Timeout error connecting to 'https://a7a83a94a9b8.ngrok.app/api' - HTTP status=408"}
30.logERROR ~ Unexpected error on command: get-container - cause: java.net.http.HttpConnectTimeoutException: HTTP connect timed out
34.logIn this testing attempt, Tower still throws java.util.concurrent.CompletionException: java.io.IOException: HTTP/1.1 header parser received no bytes
with Corretto-20.0.2.9.1
attaching part of tower, where error is occuring
tower.log
Ok, then we can rule out that's a problem with the JVM.
Can try adding the annotation @ExecuteOn(TaskExecutors.IO)
to the /user-info
endpoint here
Still there is error: Nextflow logs: 4.log 24.log
Tower Error:
Jul-27 17:40:47.637 [ForkJoinPool.commonPool-worker-5] - ERROR io.seqera.wave.PairingClient - Unexpected error proxing request https://46941462849d.ngrok.app/api/user-info GET -- cause: java.io.IOException: HTTP/1.1 header parser received no bytes
java.util.concurrent.CompletionException: java.io.IOException: HTTP/1.1 header parser received no bytes
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1152)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at java.net.http/jdk.internal.net.http.Http1Response.onReadError(Http1Response.java:554)
at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:676)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:302)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:268)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.HttpClientImpl$DelegatingExecutor.execute(HttpClientImpl.java:157)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:305)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:274)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.onReadError(Http1AsyncReceiver.java:511)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver$Http1TubeSubscriber.onComplete(Http1AsyncReceiver.java:596)
at java.net.http/jdk.internal.net.http.common.SSLTube$DelegateWrapper.onComplete(SSLTube.java:276)
at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.complete(SSLTube.java:440)
at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onComplete(SSLTube.java:541)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.checkCompletion(SubscriberWrapper.java:472)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run1(SubscriberWrapper.java:334)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run(SubscriberWrapper.java:259)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:232)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader.processData(SSLFlowDelegate.java:513)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader$ReaderDownstreamPusher.run(SSLFlowDelegate.java:268)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:348)
at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:675)
... 30 common frames omitted
Caused by: java.io.EOFException: EOF reached while reading
... 21 common frames omitted
Jul-27 17:40:47.659 [ForkJoinPool.commonPool-worker-3] - ERROR io.seqera.wave.PairingClient - Unexpected error proxing request https://46941462849d.ngrok.app/api/user-info GET -- cause: java.io.IOException: HTTP/1.1 header parser received no bytes
java.util.concurrent.CompletionException: java.io.IOException: HTTP/1.1 header parser received no bytes
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1152)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
at java.net.http/jdk.internal.net.http.Http1Response.onReadError(Http1Response.java:554)
at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:676)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.checkForErrors(Http1AsyncReceiver.java:302)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.flush(Http1AsyncReceiver.java:268)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.HttpClientImpl$DelegatingExecutor.execute(HttpClientImpl.java:157)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:305)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:274)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver.onReadError(Http1AsyncReceiver.java:511)
at java.net.http/jdk.internal.net.http.Http1AsyncReceiver$Http1TubeSubscriber.onComplete(Http1AsyncReceiver.java:596)
at java.net.http/jdk.internal.net.http.common.SSLTube$DelegateWrapper.onComplete(SSLTube.java:276)
at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.complete(SSLTube.java:440)
at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onComplete(SSLTube.java:541)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.checkCompletion(SubscriberWrapper.java:472)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run1(SubscriberWrapper.java:334)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run(SubscriberWrapper.java:259)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:232)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader.processData(SSLFlowDelegate.java:513)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader$ReaderDownstreamPusher.run(SSLFlowDelegate.java:268)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: HTTP/1.1 header parser received no bytes
at java.net.http/jdk.internal.net.http.common.Utils.wrapWithExtraDetail(Utils.java:348)
at java.net.http/jdk.internal.net.http.Http1Response$HeadersReader.onReadError(Http1Response.java:675)
... 30 common frames omitted
Caused by: java.io.EOFException: EOF reached while reading
... 21 common frames omitted
Jul-27 17:41:22.557 [scheduled-executor-thread-7] - TRACE io.seqera.wave.PairingClient - Sending Wave client heartbeat
Jul-27 17:41:23.420 [default-nioEventLoopGroup-8-2] - TRACE io.seqera.wave.PairingClient - Received Heartbeat message=PairingHeartbeat(msgId:ff2300751308)
Jul-27 17:41:23.608 [ForkJoinPool.commonPool-worker-3] - ERROR io.seqera.wave.PairingClient - Unexpected error proxing request https://46941462849d.ngrok.app/api/user-info GET -- cause: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
java.util.concurrent.CompletionException: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1152)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1773)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
at java.net.http/jdk.internal.net.http.common.SSLTube.checkForHandshake(SSLTube.java:595)
at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onComplete(SSLTube.java:536)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.checkCompletion(SubscriberWrapper.java:472)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run1(SubscriberWrapper.java:334)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run(SubscriberWrapper.java:259)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:232)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader.processData(SSLFlowDelegate.java:513)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader$ReaderDownstreamPusher.run(SSLFlowDelegate.java:268)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
... 3 common frames omitted
Jul-27 17:41:23.606 [ForkJoinPool.commonPool-worker-5] - ERROR io.seqera.wave.PairingClient - Unexpected error proxing request https://46941462849d.ngrok.app/api/user-info GET -- cause: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
java.util.concurrent.CompletionException: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
at java.base/java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:368)
at java.base/java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:377)
at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1152)
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1773)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
at java.net.http/jdk.internal.net.http.common.SSLTube.checkForHandshake(SSLTube.java:595)
at java.net.http/jdk.internal.net.http.common.SSLTube$SSLSubscriberWrapper.onComplete(SSLTube.java:536)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.checkCompletion(SubscriberWrapper.java:472)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run1(SubscriberWrapper.java:334)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper$DownstreamPusher.run(SubscriberWrapper.java:259)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:303)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:256)
at java.net.http/jdk.internal.net.http.common.SubscriberWrapper.outgoing(SubscriberWrapper.java:232)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader.processData(SSLFlowDelegate.java:513)
at java.net.http/jdk.internal.net.http.common.SSLFlowDelegate$Reader$ReaderDownstreamPusher.run(SSLFlowDelegate.java:268)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$LockingRestartableTask.run(SequentialScheduler.java:205)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$CompleteRestartableTask.run(SequentialScheduler.java:149)
at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:230)
... 3 common frames omitted
We are seeing an increasing number of the following exception. This is actually reported by the Wave client in Nextflow, but I was wondering if it could be related by some kind of failure on the backend side.