mapfish / mapfish-print

A component of MapFish for printing templated cartographic maps. This module is the Java serverside module.
http://mapfish.github.io/mapfish-print-doc/
BSD 2-Clause "Simplified" License
186 stars 416 forks source link

Observability #3393

Open sbrunner opened 2 months ago

sbrunner commented 2 months ago

Introduction

Currently, we get some lag around the observability of the application, then here we defined how it should be.

Notes that here we define the general framework, not all the specific cases event if we pout the first wanted implementations.

We target in priority the Kubernetes/Docker environment, then some words comes from this world.

Usage of the health checks.

This will update the Result or the /metrics/healthcheck endpoint.

Examples of the responses

When we call the healthy method:

{
    "application": {
        "healthy": true,
        "message": "sbr test.",
        "duration": 0,
        "timestamp": "2024-08-29T13:44:58.398Z"
    }
}

Response code: HTTP code 200.

When we call the unhealthy method:

{
    "application": {
        "healthy": false,
        "message": "sbr test.",
        "duration": 0,
        "timestamp": "2024-08-29T13:44:58.398Z"
    }
}

Response code: HTTP status code 200 (or 500 when we set JAVA_OPTS to -DhttpStatusIndicator=true)

If we raise an exception:

{
    "application": {
        "healthy": false,
        "message": "sbr test.",
        "error": {
            "type": "java.lang.RuntimeException",
            "message": "sbr test.",
            "stack": [
                "org.mapfish.print.metrics.ApplicationStatus.check(ApplicationStatus.java:15)", 
                "com.codahale.metrics.health.HealthCheck.execute(HealthCheck.java:374)", 
                "com.codahale.metrics.health.HealthCheckRegistry.runHealthChecks(HealthCheckRegistry.java:184)", 
                "com.codahale.metrics.servlets.HealthCheckServlet.runHealthChecks(HealthCheckServlet.java:177)", 
                "com.codahale.metrics.servlets.HealthCheckServlet.doGet(HealthCheckServlet.java:146)", 
                "javax.servlet.http.HttpServlet.service(HttpServlet.java:529)", 
                "javax.servlet.http.HttpServlet.service(HttpServlet.java:623)", 
                "com.codahale.metrics.servlets.AdminServlet.service(AdminServlet.java:153)", 
                "javax.servlet.http.HttpServlet.service(HttpServlet.java:623)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:199)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:209)", 
                "com.thetransactioncompany.cors.CORSFilter.doFilter(CORSFilter.java:244)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "com.codahale.metrics.servlet.AbstractInstrumentedFilter.doFilter(AbstractInstrumentedFilter.java:112)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:352)", 
                "org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:117)", 
                "org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:126)", 
                "org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:120)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:131)", 
                "org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:85)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", "org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:100)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:164)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", "org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:63)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:168)",
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.header.HeaderWriterFilter.doHeadersAfter(HeaderWriterFilter.java:90)", 
                "org.springframework.security.web.header.HeaderWriterFilter.doFilterInternal(HeaderWriterFilter.java:75)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:62)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:117)", 
                "org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.access.channel.ChannelProcessingFilter.doFilter(ChannelProcessingFilter.java:133)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.session.DisableEncodeUrlFilter.doFilterInternal(DisableEncodeUrlFilter.java:42)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:361)", 
                "org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:225)", 
                "org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:190)", 
                "org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:354)", 
                "org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:267)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)", 
                "org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:117)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.mapfish.print.servlet.RequestSizeFilter.doFilter(RequestSizeFilter.java:40)", 
                "org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:168)", 
                "org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:144)", 
                "org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:168)", 
                "org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90)", 
                "org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482)", 
                "org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:130)", 
                "org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93)", 
                "org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74)", 
                "org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:346)", 
                "org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:388)", 
                "org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63)", 
                "org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:936)", 
                "org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1791)", 
                "org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52)", 
                "org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1190)", 
                "org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659)", 
                "org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63)", 
                "java.base/java.lang.Thread.run(Thread.java:829)"
            ]                
        },
        "duration": 0,
        "timestamp": "2024-08-29T13:46:38.000Z"
    }
}

Response code: HTTP code 200.

Propose usage

Use this endpoint only in the Kubernetes endpoint to automatically restart the Pod.

Use the heathy and unhealthy method to change the view status.

Add the option httpStatusIndicator=true in the file core/src/main/resources/mapfish-spring.properties.

At first, we should get an error when we don't consume the queue event if she is not empty, at therm I think that this check should be something like this (for this check we need a time window.):

If the queue is empty during the time window => healthy
If a print job ends during the time window => healthy
Otherwise => unhealthy

It's possible that we need a check that tests the building of an epsg code, in the past we get on some container this exception:

java.lang.RuntimeException: EPSG:2056 was not recognized as a crs code
    at org.mapfish.print.output.Values.populateFromAttributes(Values.java:229)
    at org.mapfish.print.output.Values.<init>(Values.java:153)
    at org.mapfish.print.output.Values.<init>(Values.java:110)
    at org.mapfish.print.output.AbstractJasperReportOutputFormat.getJasperPrint(AbstractJasperReportOutputFormat.java:137)
    at org.mapfish.print.output.AbstractJasperReportOutputFormat.print(AbstractJasperReportOutputFormat.java:94)
    at org.mapfish.print.MapPrinter.print(MapPrinter.java:133)
    at org.mapfish.print.servlet.job.PrintJob.lambda$call$0(PrintJob.java:148)
    at org.mapfish.print.servlet.job.PrintJob.withOpenOutputStream(PrintJob.java:118)
    at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:147)
    at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:54)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: EPSG:2056 was not recognized as a crs code
    at org.mapfish.print.attribute.map.GenericMapAttribute.parseProjection(GenericMapAttribute.java:93)
    at org.mapfish.print.attribute.map.GenericMapAttribute$GenericMapAttributeValues.parseProjection(GenericMapAttribute.java:516)
    at org.mapfish.print.attribute.map.MapAttribute$MapAttributeValues.parseBounds(MapAttribute.java:164)
    at org.mapfish.print.attribute.map.MapAttribute$MapAttributeValues.postConstruct(MapAttribute.java:160)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.mapfish.print.parser.MapfishParser.parse(MapfishParser.java:138)
    at org.mapfish.print.attribute.ReflectiveAttribute.getValue(ReflectiveAttribute.java:428)
    at org.mapfish.print.output.Values.populateFromAttributes(Values.java:203)
    ... 13 common frames omitted
Caused by: org.opengis.referencing.NoSuchAuthorityCodeException: No code "EPSG:2056" from authority "European Petroleum Survey Group" found for object of type "IdentifiedObject".
    at org.geotools.referencing.factory.AbstractAuthorityFactory.noSuchAuthorityCode(AbstractAuthorityFactory.java:874)
    at org.geotools.referencing.factory.PropertyAuthorityFactory.getWKT(PropertyAuthorityFactory.java:289)
    at org.geotools.referencing.factory.PropertyAuthorityFactory.createCoordinateReferenceSystem(PropertyAuthorityFactory.java:358)
    at org.geotools.referencing.factory.BufferedAuthorityFactory.createCoordinateReferenceSystem(BufferedAuthorityFactory.java:731)
    at org.geotools.referencing.factory.AuthorityFactoryAdapter.createCoordinateReferenceSystem(AuthorityFactoryAdapter.java:779)
    at org.geotools.referencing.factory.FallbackAuthorityFactory.createCoordinateReferenceSystem(FallbackAuthorityFactory.java:624)
    at org.geotools.referencing.factory.AuthorityFactoryAdapter.createCoordinateReferenceSystem(AuthorityFactoryAdapter.java:779)
    at org.geotools.referencing.factory.ThreadedAuthorityFactory.createCoordinateReferenceSystem(ThreadedAuthorityFactory.java:635)
    at org.geotools.referencing.DefaultAuthorityFactory.createCoordinateReferenceSystem(DefaultAuthorityFactory.java:176)
    at org.geotools.referencing.CRS.decode(CRS.java:517)
    at org.geotools.referencing.CRS.decode(CRS.java:433)
    at org.mapfish.print.attribute.map.GenericMapAttribute.parseProjection(GenericMapAttribute.java:88)
    ... 23 common frames omitted

See also Jira issue.

Usage of the metric.

The metrics should be reviewer and documented, currently it's a little mess...

At first, we should add a gauge to observe the queue length and a timer to observe the total print duration.

Then we should review all the metrics, see if they're working, update/remove them if needed, add documentation.

Pertinent metric:

Current metrics:

Cluster check

If we need a check, e.g. to notify that the print job queue it too long` we probably need to create a custom endpoint.

Resume

Use health checks only for health concerned with the current container.

Identify and add missing metrics to be able to better monitor the application with tools like Prometheus/Grafana.

Eventually, add a new endpoint for more specific checks like too long queue.

sbrunner commented 2 months ago

The access logs should also easy be enabled: See: https://tomcat.apache.org/tomcat-9.0-doc/config/valve.html#Access_Logging