kestra-io / kestra

:zap: Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
https://kestra.io
Apache License 2.0
13.46k stars 1.17k forks source link

Webserver goes down on huge log #826

Open aurelienWls opened 2 years ago

aurelienWls commented 2 years ago

Expected Behavior

When we have a flow with a lot of log the webserver should handdle this carrefully

Actual Behaviour

When we have a flow with a lot of log, sometimes the webserver goes down Here is the stacktrace

Error occurred writing stream response: OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [3035037630/2.8gb], which is larger than the limit of [3026295193/2.8gb], real usage: [3035037320/2.8gb], new bytes reserved: [310/310b], usages [request=0/0b, fielddata=239198/233.5kb, in_flight_requests=310/310b, accounting=5534678/5.2mb]]
org.opensearch.OpenSearchStatusException: OpenSearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [3035037630/2.8gb], which is larger than the limit of [3026295193/2.8gb], real usage: [3035037320/2.8gb], new bytes reserved: [310/310b], usages [request=0/0b, fielddata=239198/233.5kb, in_flight_requests=310/310b, accounting=5534678/5.2mb]]
  at org.opensearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:207)
  at org.opensearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2075)
  at org.opensearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2052)
  at org.opensearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1771)
  at org.opensearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1724)
  at org.opensearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1692)
  at org.opensearch.client.RestHighLevelClient.scroll(RestHighLevelClient.java:1203)
  at io.kestra.repository.elasticsearch.AbstractElasticSearchRepository.scroll(AbstractElasticSearchRepository.java:443)
  at io.kestra.repository.elasticsearch.AbstractElasticSearchRepository.scroll(AbstractElasticSearchRepository.java:422)
  at io.kestra.repository.elasticsearch.ElasticSearchLogRepository.findByExecutionId(ElasticSearchLogRepository.java:100)
  at io.kestra.webserver.controllers.LogController.lambda$follow$3(LogController.java:93)
  at io.reactivex.internal.operators.flowable.FlowableCreate.subscribeActual(FlowableCreate.java:71)
  at io.reactivex.Flowable.subscribe(Flowable.java:14935)
  at io.reactivex.Flowable.subscribe(Flowable.java:14882)
  at io.micronaut.rxjava2.instrument.RxInstrumentedFlowable.subscribeActual(RxInstrumentedFlowable.java:57)
  at io.reactivex.Flowable.subscribe(Flowable.java:14935)
  at io.reactivex.internal.operators.flowable.FlowableDoOnLifecycle.subscribeActual(FlowableDoOnLifecycle.java:38)
  at io.reactivex.Flowable.subscribe(Flowable.java:14935)
  at io.reactivex.internal.operators.flowable.FlowableDoOnEach.subscribeActual(FlowableDoOnEach.java:50)
  at io.reactivex.Flowable.subscribe(Flowable.java:14935)
  at io.reactivex.Flowable.subscribe(Flowable.java:14885)
  at reactor.core.publisher.FluxSource.subscribe(FluxSource.java:67)
  at reactor.core.publisher.InternalFluxOperator.subscribe(InternalFluxOperator.java:62)
  at reactor.core.publisher.FluxSubscribeOn$SubscribeOnSubscriber.run(FluxSubscribeOn.java:194)
  at io.micronaut.reactive.re...

Steps To Reproduce

Create a flow with huge log. It start to fail with a log of 10Mb - 15Mb

Environment Information

Example flow

No response

brian-mulier-p commented 1 year ago

I guess it is the same issue we discussed about @loicmathieu, the idea would be to truncate logs if they are too large. @aurelienWls, is it causing any trouble without going to the logs tab or the issue occurs only when going into this tab ?

loicmathieu commented 1 month ago

An execution can have a lot of logs, at the moment, when we list the logs from the repository, we just return the list which load all logs into memory.

What we need is to use a Publisher (or a Stream) and don't load all logs into memory.

From the controller, we need to paginate logs everywhere (or only return the first 1000 and display a message to redirect to downloading the logs). For downloading, we should stream the logs into the download response.