openoereb / pyramid_oereb_mfp

Implementation of a static extract (PDF export) for the pyramid_oereb server with MapFish-Print
1 stars 3 forks source link

mapfish_print 3.30.2 not running stable #145

Open peterschaer opened 8 months ago

peterschaer commented 8 months ago

We have upgraded our application to pyramid_oereb 2.4.8. pyramid_oereb_mfp uses the latest commit (8d9da5ab3a8d1a8469fd693e4f86c5870ee9f6dc). At the same time, we have upgraded mapfish_print - as recommended in https://github.com/openoereb/pyramid_oereb_mfp/blob/8d9da5ab3a8d1a8469fd693e4f86c5870ee9f6dc/docker-compose.yml#L8 to version 3.30.2. However, this constellation causes problems in our test environment. Everything actually works perfectly - with the exception of PDF creation. This is not running stable. On the one hand, certain complex extracts (large geometries, many PLRs) are not created at all or only very rarely (HTTP error 500) and on the other hand, a less complex call sometimes results in an HTTP error 500 if it is called several times in succession. In the application log file, we observe errors such as

12:41:41.866 [http-nio-8080-exec-9] WARN o.m.print.servlet.BaseMapServlet - Error while processing request: java.lang.NullPointerException: Inflater has been closed

I have attached all log messages of an erroneous request: single_failed_request.log and here is a log of multiple successful and failed requests: reporting_oereb2_test.zip

Our application runs in a Kubernetes cluster. The mapfish_print pod has 1.8 CPU and 2.2 GB RAM assigned to it. The image is configured with -Xmx512M -Dmapfish.maxContentLength=8388608

In the production environment (pyramid_oereb 2.4.4 and mapfish_print 3.29.4) the pods run with exactly the same resources and these problems do not occur. My question now: have other cantons also observed this behavior? I assume that at least @michmuel has observed something similar (https://github.com/openoereb/pyramid_oereb/pull/1814#issuecomment-1711359398). Does anyone know what these errors mean? Lack of resources? Other causes?

michmuel commented 7 months ago

It is true, we also recognized that the image 3.30 runs less stable in Openshift than the image 3.29. I performed some local tests but have not managed to reproduce the error-prone behavior. So we still run the image 3.29.

@svamaa You run the image 3.30 with restarting the container after each print. Is that correct?

ghost commented 7 months ago

We run the image 3.30.6 with very high resource limits (7G) in Kubernetes. We've configured the restart of the pods after 8G of RAM was used.

michmuel commented 7 months ago

We run the image 3.29 with 500m CPU and 1100Mi Memory and restart the pods every 2 hours. My impression is that the image requires quite a lot of resources for its task. Is there something related to this image that might cause some unwanted behaviour in Openshift? How could we proceed @peterschaer, @jwkaltz ?

peterschaer commented 6 months ago

Sorry for the late reply, I was busy with other things. Anyway: @jwkaltz is there a reason why mapfish_print suddenly needs so many more resources? And is there a recommendation on how many resources (especially RAM) mapfish_print needs? That would help us a lot.

jwkaltz commented 6 months ago

@peterschaer I will discuss it with the main contributors of MapFish-Print. I do note that this phenomenon does not occur in most projects using MapFish-Print, there are reports here from you and Basel-Land, it seems the other cantons do not experience this with pyramid_oereb, and we also did not notice this effect in the geoportals running MapFish-Print. We do have a report from Basel-Stadt's geoportal about a memory leak, this is however already the case in older versions of MapFish-Print.

Furthermore, please note that MapFish-Print is currently undergoing big maintenance tasks in order to update the libraries being used and to facilitate its debugging. I do hope that this will help us to clarify what is going on in your (and Basel-Land's) use case.

jwkaltz commented 1 month ago

@peterschaer @michmuel can you please confirm that, in your environment, you have more problems with MapFish-Print 3.30 than with 3.29, and can you provide any more details? There is ongoing work to improve MapFish-Print in general, but if you have anything more specific we should open a new issue in https://github.com/mapfish/mapfish-print

peterschaer commented 3 weeks ago

@jwkaltz I can confirm that we have more problems with MapFish-Print 3.30 than with 3.29. At the moment I do not have more details as I have already listed in the initial post of this issue.

lopo977 commented 1 week ago

@jwkaltz we have still problem, with the new MPF 3.30.11 configured with -Xmx2G -Dmapfish.maxContentLength=8388608

Allocated RAM has reached almost 4 GB and the we have strange behaviors:

CH657302880741-22.pdf

jwkaltz commented 1 week ago

@lopo977 can you please provide, as a file, the print specification that MapFish-Print is processing when you have errors? This should be visible in the MapFish-Print logs. This way, we can try to reproduce the error in a local development environment.

lopo977 commented 1 week ago

@lopo977 can you please provide, as a file, the print specification that MapFish-Print is processing when you have errors? This should be visible in the MapFish-Print logs. This way, we can try to reproduce the error in a local development environment.

Here we are... this file is sometimes processed correctly in 3.30.11 and randomly MFP says that we have an error:.... spec.json

lopo977 commented 1 week ago

After some times something of weird happens... i.e. out.log

in this case

Caused by: java.lang.NullPointerException: Inflater has been closed

931c3587-9cf0-49dc-a11a-f61a547446c3@0b960d38-fd27-445e-890e-d44bde71c413
java.util.concurrent.ExecutionException: java.lang.NullPointerException
    at java.base/java.util.concurrent.ForkJoinTask.get(ForkJoinTask.java:1006)
    at org.mapfish.print.output.AbstractJasperReportOutputFormat.getJasperPrint(AbstractJasperReportOutputFormat.java:160)
    at org.mapfish.print.output.AbstractJasperReportOutputFormat.print(AbstractJasperReportOutputFormat.java:104)
    at org.mapfish.print.MapPrinter.print(MapPrinter.java:133)
    at org.mapfish.print.servlet.job.PrintJob.lambda$call$0(PrintJob.java:150)
    at org.mapfish.print.servlet.job.PrintJob.withOpenOutputStream(PrintJob.java:120)
    at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:149)
    at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:56)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NullPointerException: null
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:603)
    ... 12 common frames omitted
Caused by: java.lang.NullPointerException: null
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
    at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:603)
    at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:678)
    at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:722)
    at org.mapfish.print.processor.ProcessorDependencyGraph.tryExecuteNodes(ProcessorDependencyGraph.java:63)
    at org.mapfish.print.processor.ProcessorDependencyGraph$ProcessorGraphForkJoinTask.lambda$compute$0(ProcessorDependencyGraph.java:212)
    at org.mapfish.print.processor.AbstractProcessor$Context.mdcContext(AbstractProcessor.java:248)
    at org.mapfish.print.processor.ProcessorDependencyGraph$ProcessorGraphForkJoinTask.compute(ProcessorDependencyGraph.java:208)
    at org.mapfish.print.processor.ProcessorDependencyGraph$ProcessorGraphForkJoinTask.compute(ProcessorDependencyGraph.java:185)
    at java.base/java.util.concurrent.RecursiveTask.exec(RecursiveTask.java:94)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
    at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
    at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
    at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
    at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: java.lang.NullPointerException: Inflater has been closed
    at java.base/java.util.zip.Inflater.ensureOpen(Inflater.java:740)
    at java.base/java.util.zip.Inflater.inflate(Inflater.java:377)
    at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:153)
    at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:118)
    at java.base/java.io.FilterInputStream.read(FilterInputStream.java:107)
    at org.apache.http.client.entity.LazyDecompressingInputStream.read(LazyDecompressingInputStream.java:64)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1309)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:978)
    at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1282)
    at org.apache.commons.io.IOUtils.copy(IOUtils.java:953)
    at org.mapfish.print.http.HttpRequestFetcher$CachedClientHttpResponse.<init>(HttpRequestFetcher.java:96)
    at org.mapfish.print.http.HttpRequestFetcher$CachedClientHttpResponse.<init>(HttpRequestFetcher.java:81)
    at org.mapfish.print.http.HttpRequestFetcher$CachedClientHttpRequest.lambda$call$0(HttpRequestFetcher.java:212)
    at org.mapfish.print.processor.AbstractProcessor$Context.mdcContextEx(AbstractProcessor.java:269)
    at org.mapfish.print.http.HttpRequestFetcher$CachedClientHttpRequest.call(HttpRequestFetcher.java:204)
    at org.mapfish.print.http.HttpRequestFetcher$CachedClientHttpRequest.call(HttpRequestFetcher.java:138)
    at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1448)
    ... 5 common frames omitted
michmuel commented 1 week ago

@jwkaltz

I tested MPF 3.30.11 in OpenShift by repeatedly calling the pdf for the same real estate. After some successful requests MFP runs into a time-out. I did not observe this behavior with MFP 3.29.

log-27-Sep-2024-edited.txt

lopo977 commented 1 week ago

@jwkaltz

I tested MPF 3.30.11 in OpenShift by repeatedly calling the pdf for the same real estate. After some successful requests MFP runs into a time-out. I did not observe this behavior with MFP 3.29.

log-27-Sep-2024-edited.txt

The anomalous behaviors of MFP happen, as time passes and the allocated memory always grows. At some point, I think, when it can't allocate more memory, the strange behaviors start. Not related to a specific egrid. With particularly complex funds maybe you get to a critical memory allocation in less time. The anomalies are the same reported by the other authors in this post(issue). I suspect a memory leak.