pblottiere commented 4 years ago

QGIS Server and monitoring

Contact

blottiere.paul@gmail.com

Maintainer @pblottiere

Version QGIS 3

Summary

According to the recent discussions, a monitoring console for QGIS Server seems more and more important. Such a tool should provide various services like:

getting an insight of the state of QGIS server
handling several QGIS Server instances (behind a load balancer, hosted in a docker container or a VM, ...)

However, it should not be at the expense of:

security
dependencies (to keep the core as minimal as possible)
architecture (we want to keep it simple)

Adding a dedicated admin endpoint to the API (for example http://localhost/qgisserver/admin) would be the simplest way to go but raises a lot of underlying issues:

security: we don't want to provide such a service by default because retrieving internal parameters of a running instance of QGIS Server can lead to security issue. It's even more true in case of on the fly configuration. Otherwise, we would need an authentication system, but it's anything but simple (nor desired).
load balancing: when several instances of QGIS Server are running behind a load balancer, we cannot really reach a specific instance or all instances at once in this way (broadcasting).

Considering that we cannot rely on the server API for this objective, we propose to add a dedicated reporting service in QGIS Server which periodically sends a JSON report to a remote HTTP endpoint.

Proposed solution

Architecture

We propose to add a reporting service in charge of sending a JSON report periodically. The underlying loop in charge of sending the report would run in a dedicated thread in order not to disturb the usual tasks of the main thread.

The reports is sent to a HTTP endpoint trough a POST request.

qep

Then, the endpoint can prepare the data for a mainstream monitoring tool (Grafana, Chronograf, etc...) or for a custom admin panel.

Implementation

The new service is implemented in src/server/services/reporting and inherits from QgsServiceModule. Then a dedicated thread manages the loop in charge of sending the report. The periodic action is achieved by a QTimer. Due to the multi-threaded context, some modifications in the QGIS Server code source are necessary (mutex).

Moreover, we want to offer the possibility to customize the JSON report through a Python plugin. In this case, we can't rely on the current implementation because current server Plugins classes (QgsServerFilter, QgsAccessControlFilter and QgsServerCacheFilter) are designed to respond to incoming requests. So a new mechanism is necessary (and still need to be discussed).

Configuration

Some new environment variables are necessary to configure the reporting service:

QGIS_SERVER_REPORTING_PERIOD_SEC: period in seconds
QGIS_SERVER_REPORTING_ENDPOINT: HTTP endpoint where the report is sent
QGIS_SERVER_REPORTING_NAME: the name will be suffixed with a uuid in the report

Content of the report

The JSON report build by the reporting service may contain the next information:

projects currently cached
an "About" section (like the one in QGIS Desktop with the version number, ...)
the current configuration (default qgs project, max width/height, ...)
fonts
plugins
registered providers

Moreover, a Python plugin can easily add system information in the report:

memory
CPU
disk spaces
swap

Proof Of Concept

TODO

Performance Implications

I still have a question regarding the Python plugin called from the reporting thread. Due to the GIL, plugins called from the main thread and the plugin called from the reporting thread are probably concurrent. So it may have consequences in some corner cases. Opinions?

Backwards Compatibility

N/A.

pblottiere commented 4 years ago

@andreasneumann @Gustry @m-kuhn @elpaso @jgrocha @rldhont @haubourg @nyalldawson @wonder-sk @vpicavet

I'd love to hear your thoughts on this QEP (and especially your critics) :)

sbrunner commented 4 years ago

Then when we have a cluster the shared memory segment will be on an NFS? I think that when we want to use it with Docker on a cloud (with OpenShift, EKS sot AKS) it will not be so easy, shouldn't we provide a way to do that with Redis (with a plugin if necessary...)?

pblottiere commented 4 years ago

Hi @sbrunner,

Then when we have a cluster the shared memory segment will be on an NFS? I think that when we want to use it with Docker on a cloud (with OpenShift, EKS sot AKS) it will not be so easy, shouldn't we provide a way to do that with Redis (with a plugin if necessary...)?

The multi-machine aspect is not covered by the shared memory mechanism (~~unless you want to use NFS~~), but as stated in the Abstraction chapter, you could implement your own backend if you want.

The shared memory mechanism would be a valid default behavior (and more than sufficient for a lot of users). And in case of Docker containers, you can still share a directory and namespace with them.

vpicavet commented 4 years ago

Apparently I was not clear in my previous comments, but I would rather stick to an HTTP API-based mechanism for node-to-node interactions, than shared memory. From my point of view, a single QGIS instance should not bother at all with cluster management. This should be done at a higher level from another application.

sbrunner commented 4 years ago

Thanks @pblottiere, I miss that :-)

sbrunner commented 4 years ago

@vpicavet as I understand this isn't in contradiction of having an external mechanism...

wonder-sk commented 4 years ago

Isn't it too limiting to use shared memory for this kind of mechanism? I think it is common to have QGIS server running in containers possibly on multiple (virtual) machines, so it would be good to take that into account from the beginning rather than "just" giving users an option to implement a custom solution for that case (which may not be easy to do...?)

pblottiere commented 4 years ago

Hi @vpicavet,

Apparently I was not clear in my previous comments, but I would rather stick to an HTTP API-based mechanism for node-to-node interactions, than shared memory.

As previously stated, I'm not convinced that sticking to an HTTP API is a good idea in this context.

From my point of view, a single QGIS instance should not bother at all with cluster management. This should be done at a higher level from another application.

Actually I totally agree. And in this case, a single QGIS Server instance would write its own information, but won't use/know anything about other instances.

pblottiere commented 4 years ago

Hi @wonder-sk,

Isn't it too limiting to use shared memory for this kind of mechanism? I think it is common to have QGIS server running in containers possibly on multiple (virtual) machines, so it would be good to take that into account from the beginning rather than "just" giving users an option to implement a custom solution for that case (which may not be easy to do...?)

A shared memory is just a file on the filesystem. ~~And a file can be shared between containers and virtual machines.~~. And in case of docker containers, IPC namespace can be shared.

wonder-sk commented 4 years ago

A shared memory is just a file on the filesystem. And a file can be shared between containers and virtual machines.

Are you sure about that? As far as I understand QSharedMemory, it uses inter-process communication APIs provided by OS kernel and a file on the filesystem is rather a key for processes that want to communicate... Or am I wrong?

Also, containers seem to be in separate namespaces by default and in order to allow them to use shared memory (within a single machine), it needs to be enabled explicitly in Docker: https://stackoverflow.com/questions/23889187/is-it-possible-to-share-memory-between-docker-containers Not sure about Kubernetes, my guess would be that shared memory between containers would not be supported even within a single node...

dmarteau commented 4 years ago

The shared memory segment could be the default behavior but we could also add an abstraction over the whole mechanism

This is a key point: there is nothing that would prevent using a Redis or MemCache or anything enabling sharing info between nodes across nodes where inter process communication is not possible.

pblottiere commented 4 years ago

Are you sure about that? As far as I understand QSharedMemory, it uses inter-process communication APIs provided by OS kernel and a file on the filesystem is rather a key for processes that want to communicate... Or am I wrong?

@wonder-sk Mmmm I will try, it's the best way to be sure. But even then, it would be a valid default behavior, fully customizable if you want to write your own backend.

so it would be good to take that into account from the beginning rather than "just" giving users an option to implement a custom solution for that case (which may not be easy to do...?)

Actually, I just wanted a very simple default behavior (to avoid dependencies, ...), but another means of communication may be considered.

pblottiere commented 4 years ago

Are you sure about that? As far as I understand QSharedMemory, it uses inter-process communication APIs provided by OS kernel and a file on the filesystem is rather a key for processes that want to communicate... Or am I wrong?

@wonder-sk Mmmm I will try, it's the best way to be sure.

To be more explicit, I know that there's an --ipc option to docker run which allows to share namespace for IPC mechanisms. But the limiting factor is the fact that QSharedMemory writes in /tmp directory, and we cannot configure this behavior. So I'm not 100% sure how it interacts with the --ipc option of Docker nor with Kubernetes.

But if you think that the general idea makes sense, I'll investigate more.

dmarteau commented 4 years ago

@pblottiere

Just one thought: as long as you go with QSharedMemory, as far as I understand, there is no need no to bother with serializing/deserializing json since you may store directly C/C++ data structure.

pblottiere commented 4 years ago

Not sure about Kubernetes, my guess would be that shared memory between containers would not be supported even within a single node...

According to this documentation https://kubernetes.io/docs/concepts/policy/pod-security-policy/#host-namespaces, pod containers can share the host IPC namespace. So it "should" work (while it should be tested).

But it would be possible in a single node only indeed.

pblottiere commented 4 years ago

Regarding the first feedback, I think that one of the main questions is: Do we want to provide a mechanism allowing to monitor several instances of QGIS Server on various machines as a default behavior?

If it's a strong "yes", then a shared memory segment is clearly NOT the way to go and this QEP can simply be closed :). However, if we want to provide a valid default behavior and an entrypoint/API allowing to implement a custom component for more complex scenario, then this QEP still makes sens (for now at least).

NathanW2 commented 4 years ago

On the monitoring front. It might not be a bad idea to integrate Prometheus metrics into the core so they are just there. I use them on most of my projects and they work really well and having it out of the box would be super handy.

On Wed, Jul 8, 2020 at 8:53 AM Paul Blottiere notifications@github.com wrote:

Regarding the first feedback, I think that one of the main questions is: Do we want to provide a mechanism allowing to monitor several instances of QGIS Server on various machines as a default behavior?

If it's a strong "yes", then a shared memory segment is clearly NOT the way to go and this QEP can simply be closed :). However, if we want to provide a valid default behavior and an entrypoint/API allowing to implement a custom component, then this QEP still makes sens (for now at least).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qgis/QGIS-Enhancement-Proposals/issues/193#issuecomment-655177642, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5FXEMDA34FTUCXKFVRWLR2ORPLANCNFSM4OS3ZXPA .

sbrunner commented 4 years ago

For Prometheus, we don't have to aggregate them, we just need them, and they will be available for each instance, I'm I wrong?

dmarteau commented 4 years ago

AFAIK prometheus require you to set up a channel for exposing your metrics. I do not think you want to do that on same channel as you expose the service. From that we are back to our starting point.

On the other side, sending metrics may be easily handled by a push mecanism from plugins (we do that and this is very effective since you do not have to know the addresse of all your instances).

jgrocha commented 4 years ago

The shared memory segment could be the default behavior but we could also add an abstraction over the whole mechanism

This is a key point: there is nothing that would prevent using a Redis or MemCache or anything enabling sharing info between nodes across nodes where inter process communication is not possible.

The whole QEP discussion has been around the fact that QSharedMemory is a good starting point, but sometime later someone has to develop an HTTP based communication when IPC is not available. The QSharedMemory would be the best option if we need to transfer lots of information between processes, but that isn't the case. The information exchanged between processes is really short.

What if we implement right now the communication based on HTTP instead of IPC (using Redis PUB/SUB mechanism) for example?

The only thing I see against is the added dependency (on Redis, in this case). In terms of security, I think we can secure a Redis instance, without need to implement the authentication in QGIS Server right away.

I have nothing against this proposal. I just want to know if there is any other alternative more consensual.

dmarteau commented 4 years ago

@jgrocha

What if we implement right now the communication based on HTTP instead of IPC (using Redis PUB/SUB mechanism) for example?

It ssems that the main idea is to provide a solution out-of-the box which does not require third party components. The target is new-comers to Qgis server.

We know that for real production infrastructure you may (and will) need something more elaborate and scalable. The problem at hand seems to be mostly how doing it without tighting qgis server with a solution that may deserve it for complex and demanding production environment.

m-kuhn commented 4 years ago

Would it make things easier to reduce the scope for "out of the box just works" to single server instances?

dmarteau commented 4 years ago

Would it make things easier to reduce the scope for "out of the box just works" to single server instances?

This would be ok for me if we can make sure not introducing features that would be counter-productive in bigger infrastructure. It would be infortunate to reduce Qgis server to a toy product.

A stated before,I think, it is much more effective to think qgis server as an API on top of which you may build ultra simple product to the most elaborate. This implies some choices on how things are implemented at core level. Designing a end user product on top of it is almost another matter.

jgrocha commented 4 years ago

Adding Redis, for example, does not seem a big issue. QGIS Server will run side by side with web based applications that are using Redis already (for sessions, for caching, etc). Having Redis as a dependency, we can use it for other things, like QGIS Server authentication and authorization. We can use it to store session IDs and then check it in every server instance to see if the user (based on his session) can access some endpoint on the that server instance.

For single server instances, QGIS Server will run with or without Redis installed and configured. They only need a Redis instance if they want to (as stated in the QEP):

getting an insight of the state of QGIS server
adjusting and reloading things
handling several QGIS Server instances (behind a load balancer, hosted in a docker container or a VM, ...)

This alternative implementation (based on Redis) does not have any additional effort for a newbie (the same as using IPC). It adds a dependency of a third part, but it solves the communication problem for custom/advanced QGIS Server deployments.

diagram_based_on_redis

dmarteau commented 4 years ago

I think that using Redis or Shared Mem or anything else is not a real issue as long as we are able to design a correct abstraction.

May be we should concentrate on that ?

elpaso commented 4 years ago

I'm strongly -1 to add a dependency to any particular external application (Redis or whatever).

If that's the way we want to proceed, we need to create an abstract interface that can connect (through plugins) to those external applications.

It is not totally clear to me if this QEP is about monitoring only (pulling information out of the server) or it is also meant to send commands to the server (e.g. to reload or invalidate a project cache).

I believe the two topics shoud be addressed separately:

Gathering information from multiple server instances

I think this can be easily implemented through an HTTP dedicated service/OGC-style-API where a client could poll the server instances periodically and retrieve the information it needs.

Authorization would be delegated to a different tier.

Sending commands to a single server instance (e.g. to reload a particular project's cache or alter its configuration)

The use case here should allow sending commands to individual instances as well as broadcast a command to multiple instances, in any event - here again - I see the ideal implementation as an HTTP dedicated service/OGC-style-API

Authorization would be delegated to a different tier.

For both cases we need to add the missing low-level methods to the C++ server classes and expose them to the serverIface object.

pblottiere commented 4 years ago

Hello,

First, thanks to everyone,I'm happy to see that this QEP is a topic of public interest.

In view of recent discussions (made in this QEP, with @vpicavet, @elpaso and others), the shared memory doesn't seem to be a good answer for a majority because of the multi-machine expectation (even for a default behavior).

Moreover, as stated by @elpaso, it could probably be easier to address the "monitoring" topic from the "configuring" aspect separately (I will create a dedicated QEP for the "configuring" topic later).

So, as a first step, we can focus on the "monitoring" need. If we don't want to rely on a low-level mechanism like a shared memory segment, then we have to rely on the network layer, like HTTP (because we don't want to add dependency to Redis, ...). In this case, we have 2 options:

QGIS Server responds to a pull with internal information
QGIS Server pushes its internal information on its own

Due to the load balancing issue (notably), it seems most suitable to use the "push" option. This way, all QGIS Server instances can send their configuration to a monitoring endpoint/application. However, if we want to periodically send/push information without adding a latency/load to the main QGIS Server process, we have to do that in a dedicated thread.

But considering that we do not wish to complicate the QGIS Server source code with multithreading, we have to find another way. Then, after having spoken with Alessandro, the solution envisaged would be to create a dedicated thread FROM a QGIS Server plugin. We should also add some entrypoints in QGIS Server core to allow a plugin to retrieve the internal configuration. This solution shall be tested, but it would be a good option for small and big infrastructures.

What do you think?

m-kuhn commented 4 years ago

What exactly are the main topics of the envisioned monitoring? Is this similar to the recent proposal by @jgrocha where fonts, folders, configuration, cache status etc. are exposed?

I think it would be very handy to be able to get some basic sanity checks without having to install a plugin.

To be able to inspect the plugin infrastructure itself
To be able to use it without having to install a plugin
To be able to use it on servers without Python support

pblottiere commented 4 years ago

What exactly are the main topics of the envisioned monitoring? Is this similar to the recent proposal by @jgrocha where fonts, folders, configuration, cache status etc. are exposed?

@m-kuhn the "whishlist" have been discussed in this QEP: https://github.com/qgis/QGIS-Enhancement-Proposals/issues/190

But mainly:

What projects are cached
An "About" section (like the one in QGIS Desktop with the version number, ...)
Retrieve the current configuration (default qgs project, max width/height, ...)
Which fonts are loaded
Which plugins are loaded
Global info endpoint (to show versions, contents of env variables, paths in QgsApplication, registered providers, any recent exceptions, basic system info (drives, cpus, memory, swap, system load))
Project info endpoint (show details about map layers (especially if any of them failed to load), if there are any missing files (e.g. SVGs), missing datum shift grid files)
A health check service to get a right answer if the server is fine or not

I'm not saying that we'll have everything, but it's a matter of discussion.

I think it would be very handy to be able to get some basic sanity checks without having to install a plugin.

I think it's mainly packaging issue. We could probably provide a default plugin without extra installation steps.

m-kuhn commented 4 years ago

I think it's mainly packaging issue. We could probably provide a default plugin without extra installation steps.

On QGIS desktop the idea is to get away from packaged plugins. Do we go a different road on server?

nyalldawson commented 4 years ago

Do we go a different road on server?

I'm personally ok with this for server, so long as they are written in C++ and not python.

The main reason against packaged plugins in desktop is that they provide an inferior experience overall (users get a different out-of-the-box experience depending on whether or not the plugins are enabled), and because of specific concerns about the quality of the packaged plugins. To me, these same concerns don't extend to newly written server plugins.

qgis / QGIS-Enhancement-Proposals