spring-cloud / spring-cloud-skipper

A package manager that installs, upgrades, and rolls back Spring Boot applications on multiple Cloud Platforms.
http://cloud.spring.io/spring-cloud-skipper/
Apache License 2.0
111 stars 78 forks source link

Could not find a free random port #933

Open mexicapita opened 4 years ago

mexicapita commented 4 years ago

I have detected a problem in the skipper server. When it has been running, deploying and stopping several streams for a while, it stops working and returns an error.

` Exception in thread "main" org.springframework.cloud.dataflow.rest.client.DataFlowClientException: Could not install AppDeployRequest [[AppDeploymentRequest@2f6d21ff commandlineArguments = list[[empty]], deploymentProperties = map['spring.cloud.deployer.group' -> 'test'], definition = [AppDefinition@2b95b70b name = 'log-v29', properties = map['spring.cloud.dataflow.stream.app.label' -> 'log', 'spring.cloud.stream.kafka.streams.binder.zkNodes' -> 'zookeeper:2181', 'spring.cloud.stream.metrics.properties' -> 'spring.application.name,spring.application.index,spring.cloud.application.,spring.cloud.dataflow.', 'spring.cloud.dataflow.stream.name' -> 'test', 'spring.cloud.stream.kafka.streams.binder.brokers' -> 'PLAINTEXT://kafka-broker:9092', 'spring.metrics.export.triggers.application.includes' -> 'integration**', 'spring.cloud.stream.metrics.key' -> 'test.log.${spring.cloud.application.guid}', 'spring.cloud.stream.bindings.input.group' -> 'test', 'spring.cloud.stream.kafka.binder.zkNodes' -> 'zookeeper:2181', 'spring.cloud.dataflow.stream.app.type' -> 'sink', 'spring.cloud.stream.bindings.input.destination' -> 'test.time', 'spring.cloud.stream.kafka.binder.brokers' -> 'PLAINTEXT://kafka-broker:9092']], resource = org.springframework.cloud.stream.app:log-sink-kafka:jar:2.1.2.RELEASE]] to platform [default]. Error Message = [Could not find a free random port range { low=20000, high=20100}]

`

If I try to continue deploying, three things can happen (I think randomly):

` Exception in thread "main" org.springframework.cloud.dataflow.rest.client.DataFlowClientException: Could not install AppDeployRequest [[AppDeploymentRequest@2f9e4e70 commandlineArguments = list[[empty]], deploymentProperties = map['spring.cloud.deployer.group' -> 'test'], definition = [AppDefinition@278b9959 name = 'log-v30', properties = map['spring.cloud.dataflow.stream.app.label' -> 'log', 'spring.cloud.stream.kafka.streams.binder.zkNodes' -> 'zookeeper:2181', 'spring.cloud.stream.metrics.properties' -> 'spring.application.name,spring.application.index,spring.cloud.application.,spring.cloud.dataflow.', 'spring.cloud.dataflow.stream.name' -> 'test', 'spring.cloud.stream.kafka.streams.binder.brokers' -> 'PLAINTEXT://kafka-broker:9092', 'spring.metrics.export.triggers.application.includes' -> 'integration**', 'spring.cloud.stream.metrics.key' -> 'test.log.${spring.cloud.application.guid}', 'spring.cloud.stream.bindings.input.group' -> 'test', 'spring.cloud.stream.kafka.binder.zkNodes' -> 'zookeeper:2181', 'spring.cloud.dataflow.stream.app.type' -> 'sink', 'spring.cloud.stream.bindings.input.destination' -> 'test.time', 'spring.cloud.stream.kafka.binder.brokers' -> 'PLAINTEXT://kafka-broker:9092']], resource = org.springframework.cloud.stream.app:log-sink-kafka:jar:2.1.2.RELEASE]] to platform [default]. Error Message = [App with deploymentId [test.log-v30] with state [deployed] doesn't match expected state [unknown]] at org.springframework.cloud.dataflow.rest.client.VndErrorResponseErrorHandler.handleError(VndErrorResponseErrorHandler.java:65) at org.springframework.web.client.ResponseErrorHandler.handleError(ResponseErrorHandler.java:63) at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:778) at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:736) at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:670) at org.springframework.web.client.RestTemplate.postForObject(RestTemplate.java:414) at org.springframework.cloud.dataflow.rest.client.StreamTemplate.deploy(StreamTemplate.java:127) at org.springframework.cloud.dataflow.rest.client.dsl.StreamDefinition.deploy(StreamDefinition.java:78) at org.springframework.cloud.dataflow.rest.client.dsl.StreamDefinition.deploy(StreamDefinition.java:87) at com.grupotsk.springflow.prometheuswrapperdsl.PrometheusWrapperDslApplication.main(PrometheusWrapperDslApplication.java:65)

`

I already tried to ask about this on gitter without success. I'm running this docker-compose.

DemianTinkiel commented 3 years ago

I've experienced this as well (on DS 2.7.0 / skipper 2.5.1 ). Similar setup as @mexicapita (docker-compose). Its most evident when destroying and creating streams very fast one after the other. I've tried doing a check and wait based on

        return scdf.streamOperations().list().getContent().stream().anyMatch(sd -> sd.getName().equals(name));

but it seems that even though the above says there is no stream, the exception happens.

If I put a time-based wait of 2+ seconds (just to check, no way I'm leaving that in), then no exception occurs

On Linux 5.6.13-100.fc30.x86_64 Docker version 19.03.12, build 48a66213fe docker-compose version 1.22.0, build f46880f

dcguim commented 3 years ago

This is most likely because skipper makes a port range available for streams to connect to it, and they are all being used. If you are running with docker-compose you can, as a work-around, stop and start the skipper container, nevertheless a definitive solution should be provided, to release unused ports and prevent this DataFlowClientException. Perhaps this feature could be provided through the skipper shell.

kherpel commented 3 years ago

We are using SCDF with Skipper 2.5.2 and also experiencing this issue regularly. We already increased the number of ports for the deployment to 200 via docker-compose but that did not solve the issue. We are redeploying all streams using SCDF shell scripts to roll out new versions but are often running into this error. After such a failure it always requires a lot of try-and-error to get the streams to deploy.

Is there a timeline or at least a workaround for the SCDF shell to prevent this kind of issues?

Thank you very much for your support.

Kind regards, Kristian

fiidim commented 3 years ago

I experience this all the time. The exact message I'm seeing is:

Caused by: java.lang.IllegalStateException: Could not find a free random port range { low=20000, high=20100}

The only way to start deploying apps again is to restart skipper. Sometimes after a restart I have to destroy the streams, since Skipper reports that the apps are not in the expected state. I've also tried increasing the port range, but it still happens.

Versions: DATAFLOW_VERSION=2.7.2 SKIPPER_VERSION=2.6.2