Use this endpoint only in the Kubernetes endpoint to automatically restart the Pod.
Use the heathy and unhealthy method to change the view status.
Add the option httpStatusIndicator=true in the file core/src/main/resources/mapfish-spring.properties.
At first, we should get an error when we don't consume the queue event if she is not empty, at therm I think that this check should be something like this (for this check we need a time window.):
If the queue is empty during the time window => healthy
If a print job ends during the time window => healthy
Otherwise => unhealthy
It's possible that we need a check that tests the building of an epsg code, in the past we get on some container this exception:
java.lang.RuntimeException: EPSG:2056 was not recognized as a crs code
at org.mapfish.print.output.Values.populateFromAttributes(Values.java:229)
at org.mapfish.print.output.Values.<init>(Values.java:153)
at org.mapfish.print.output.Values.<init>(Values.java:110)
at org.mapfish.print.output.AbstractJasperReportOutputFormat.getJasperPrint(AbstractJasperReportOutputFormat.java:137)
at org.mapfish.print.output.AbstractJasperReportOutputFormat.print(AbstractJasperReportOutputFormat.java:94)
at org.mapfish.print.MapPrinter.print(MapPrinter.java:133)
at org.mapfish.print.servlet.job.PrintJob.lambda$call$0(PrintJob.java:148)
at org.mapfish.print.servlet.job.PrintJob.withOpenOutputStream(PrintJob.java:118)
at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:147)
at org.mapfish.print.servlet.job.PrintJob.call(PrintJob.java:54)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.RuntimeException: EPSG:2056 was not recognized as a crs code
at org.mapfish.print.attribute.map.GenericMapAttribute.parseProjection(GenericMapAttribute.java:93)
at org.mapfish.print.attribute.map.GenericMapAttribute$GenericMapAttributeValues.parseProjection(GenericMapAttribute.java:516)
at org.mapfish.print.attribute.map.MapAttribute$MapAttributeValues.parseBounds(MapAttribute.java:164)
at org.mapfish.print.attribute.map.MapAttribute$MapAttributeValues.postConstruct(MapAttribute.java:160)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.mapfish.print.parser.MapfishParser.parse(MapfishParser.java:138)
at org.mapfish.print.attribute.ReflectiveAttribute.getValue(ReflectiveAttribute.java:428)
at org.mapfish.print.output.Values.populateFromAttributes(Values.java:203)
... 13 common frames omitted
Caused by: org.opengis.referencing.NoSuchAuthorityCodeException: No code "EPSG:2056" from authority "European Petroleum Survey Group" found for object of type "IdentifiedObject".
at org.geotools.referencing.factory.AbstractAuthorityFactory.noSuchAuthorityCode(AbstractAuthorityFactory.java:874)
at org.geotools.referencing.factory.PropertyAuthorityFactory.getWKT(PropertyAuthorityFactory.java:289)
at org.geotools.referencing.factory.PropertyAuthorityFactory.createCoordinateReferenceSystem(PropertyAuthorityFactory.java:358)
at org.geotools.referencing.factory.BufferedAuthorityFactory.createCoordinateReferenceSystem(BufferedAuthorityFactory.java:731)
at org.geotools.referencing.factory.AuthorityFactoryAdapter.createCoordinateReferenceSystem(AuthorityFactoryAdapter.java:779)
at org.geotools.referencing.factory.FallbackAuthorityFactory.createCoordinateReferenceSystem(FallbackAuthorityFactory.java:624)
at org.geotools.referencing.factory.AuthorityFactoryAdapter.createCoordinateReferenceSystem(AuthorityFactoryAdapter.java:779)
at org.geotools.referencing.factory.ThreadedAuthorityFactory.createCoordinateReferenceSystem(ThreadedAuthorityFactory.java:635)
at org.geotools.referencing.DefaultAuthorityFactory.createCoordinateReferenceSystem(DefaultAuthorityFactory.java:176)
at org.geotools.referencing.CRS.decode(CRS.java:517)
at org.geotools.referencing.CRS.decode(CRS.java:433)
at org.mapfish.print.attribute.map.GenericMapAttribute.parseProjection(GenericMapAttribute.java:88)
... 23 common frames omitted
Introduction
Currently, we get some lag around the observability of the application, then here we defined how it should be.
Notes that here we define the general framework, not all the specific cases event if we pout the first wanted implementations.
We target in priority the Kubernetes/Docker environment, then some words comes from this world.
Usage of the health checks.
This will update the Result or the
/metrics/healthcheck
endpoint.Examples of the responses
When we call the
healthy
method:Response code: HTTP code
200
.When we call the
unhealthy
method:Response code: HTTP status code
200
(or500
when we setJAVA_OPTS
to-DhttpStatusIndicator=true
)If we raise an exception:
Response code: HTTP code
200
.Propose usage
Use this endpoint only in the Kubernetes endpoint to automatically restart the Pod.
Use the
heathy
andunhealthy
method to change the view status.Add the option
httpStatusIndicator=true
in the filecore/src/main/resources/mapfish-spring.properties
.At first, we should get an error when we don't consume the queue event if she is not empty, at therm I think that this check should be something like this (for this check we need a time window.):
It's possible that we need a check that tests the building of an epsg code, in the past we get on some container this exception:
See also Jira issue.
Usage of the metric.
The metrics should be reviewer and documented, currently it's a little mess...
At first, we should add a gauge to observe the queue length and a timer to observe the total print duration.
Then we should review all the metrics, see if they're working, update/remove them if needed, add documentation.
Pertinent metric:
Current metrics:
HttpRequestFetcher
:AbstractSingleImageLayer
:CoverageTask
:Cluster check
If we need a check, e.g. to notify that the print job queue it too long` we probably need to create a custom endpoint.
Resume
Use health checks only for health concerned with the current container.
Identify and add missing metrics to be able to better monitor the application with tools like Prometheus/Grafana.
Eventually, add a new endpoint for more specific checks like too long queue.