qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
118 stars 37 forks source link

Move more data providers to core library #288

Closed nyalldawson closed 7 months ago

nyalldawson commented 8 months ago

QGIS Enhancement: Move more data providers to core library

Date 2024/03/14

Author Nyall Dawson (@nyalldawson)

Contact nyall dot dawson at gmail dot com

Version QGIS 3.38/3.40

Summary

Currently, QGIS data providers are split between implementation as private classes within the core library and an external "plugin" provider interface. There are significant benefits of moving providers to the "core" library and avoiding "plugin" style providers.

As of QGIS 3.36, the "core" providers are:

Plugin based providers are:

This split between core and plugin providers has developed over time. Initially, ALL data providers were implemented as plugin providers. There are some benefits of this approach:

However, there are downsides to the plugin provider implementation:

For these reasons, the recent trend has been to implementing new providers directly in the core library (hence why "younger" providers like the point cloud providers and tiled scene providers are implemented in the core library).

In this project some additional providers will be moved to become "core" providers. The targeted providers are:

This leaves the following providers as plugin providers:

(These remaining providers are excluded in order to limit the project scope, and could be targeted in future projects.)

Startup cost of provider plugins

Profiling of the qgis_process tool shows that (on a debug enabled QGIS build) the initialization of the plugin style providers accounts for ~9% of the application start up cost (this increases to ~30% of the application start up cost if the --no-python flag is added!)

Scope of work

Benefits to QGIS

Easier initialisation of PyQGIS for use in external scripts

Right now, you can initialize the PyQGIS API through the following code

from qgis.core import QgsApplication

QGISAPP = QgsApplication([], True)
QgsApplication.initQgis()

However, this approach does NOT load any of the plugin style providers, so the resultant PyQGIS environment can only access layers using GDAL/OGR/memory and the other "core" providers. Other layer types (eg wms) will always be considered invalid.

To avoid this, you need to set various environment variables such as QGIS_PREFIX_PATH so that the plugin style providers can be located and loaded. This complicates development of standalone PyQGIS scripts.

By moving more providers to core style, we can help reduce the need for the extra environment setup (and ultimately, remove it entirely!)

Performance Implications

Risks

This is a low-risk project. The interfaces for clean split of core/gui for providers are mature and widely used. The existing "core" providers are mature and the approach of including providers in the core library is proven and stable.

The targeted providers are all well covered by existing unit tests.

Further Considerations/Improvements

troopa81 commented 8 months ago

Dynamically loading a provider at runtime increases the startup time of QGIS (and also other critically performance reliant QGIS related tools such as qgis_process and QGIS server).

Do you have an idea on how much it increases startup time ?

Could we let them in providers directory and create a qgis_providers library dynamically linked but not dynamically loaded at runtime? It would keep isolation from core, advertise that all its content is private and solve the performance issues.

andreasneumann commented 8 months ago

This leaves the following providers as plugin providers:

* GRASS raster provider ("grassraster")

* GRASS vector provider ("grass")

* SAP HANA vector provider ("hana")

* SQL Server vector provider ("mssql")

* Oracle vector ("oracle")

* Virtual vector provider ("virtual")

* WCS raster provider ("wcs")

* WFS / OAPIF vector providers

@nyalldawson - is there a particular reason why WMS / WMTS / XYZ would be a core provider and WFS / OAPIF a plugin?

Both are very important and widely used network services, just as much (or even more widely used than let's say ArcGIS REST services.

rouault commented 8 months ago

and WFS / OAPIF a plugin?

I guess one reason is that in the core logic of the WFS provider we use the Qt GUI API, for example to display the dialogue box with the progress bar (and more recently in QGIS 3.36 when downloading schemas in the new "schema aware" mode). This might be tricky to un-entangle. Well, one "quick" & dirty way would probably to use a callback mechanism where the UI part of the WFS would set the callback to the core. But perhaps there should be some core mechanism to report progress of long feature iteration, but that's a topic for another enhancement.

wonder-sk commented 8 months ago

+1 this will be a good move! Some more advantages not mentioned in the QEP is the chance for 1. better testability (easier to do unit tests of internal classes of a provider), 2. better code reuse, 3. ability to expose some functionality in core API (e.g. manage list of connections through API, get parsed GetCapabilities response)

My slight preference would be to have WFS and WCS brought to core first instead of database providers (posgres, spatialite), so we have all OGC-related bits in core...

nyalldawson commented 8 months ago

@andreasneumann

is there a particular reason why WMS / WMTS / XYZ would be a core provider and WFS / OAPIF a plugin? Both are very important and widely used network services, just as much (or even more widely used than let's say ArcGIS REST services.

It's not so much that it's considered less important, rather that (as @rouault pointed out) there's particular complexities in upgrading wfs/oapif from a plugin to a core provider. There's a substantial amount of risk involved in porting that one, which is why I'm opting to leave it to a future piece of work.

@rouault

I guess one reason is that in the core logic of the WFS provider we use the Qt GUI API, for example to display the dialogue box with the progress bar (and more recently in QGIS 3.36 when downloading schemas in the new "schema aware" mode). This might be tricky to un-entangle. Well, one "quick" & dirty way would probably to use a callback mechanism where the UI part of the WFS would set the callback to the core. But perhaps there should be some core mechanism to report progress of long feature iteration, but that's a topic for another enhancement.

My thinking is that the progress bar should be removed along with all that gui code, and replaced with a QgsTask instead. That way we get nicer GUI since the wfs progress is shown alongside all the other long running tasks, and we don't have any gui related code / logic sitting in the core provider parts.

nyalldawson commented 8 months ago

@troopa81

Do you have an idea on how much it increases startup time ?

I've added a "Startup cost of provider plugins" section above

Profiling of the qgis_process tool shows that (on a debug enabled QGIS build) the initialization of the plugin style providers accounts for ~9% of the application start up cost (this increases to ~30% of the application start up cost if the --no-python flag is added!). It's the next logical "low hanging fruit" in optimising this tool. (I don't think there's any opportunities to speed up the python initialisation, and the bulk of the remainder is coming from Qt application initialisation internals).

Could we let them in providers directory and create a qgis_providers library dynamically linked but not dynamically loaded at runtime? It would keep isolation from core, advertise that all its content is private and solve the performance issues.

We could, but I don't think the extra complexity is worth it.

rouault commented 8 months ago

I've added a "Startup cost of provider plugins" section above

FYI: a somewhat related move we've made with GDAL plugins in 3.9dev : https://gdal.org/development/rfc/rfc96_deferred_plugin_loading.html "However, there is a penalty at GDALAllRegister() time. For example, on Linux, it takes ~ 300 ms to complete for a build with 126 plugins, which is a substantial time for short lived GDAL-based processes (for example a script which would run gdalinfo or ogrinfo on many files). This time is entirely spent in the dlopen() method of the operating system and there is, to the best of our knowledge, nothing we can do to reduce it.." And I confirm the time penalty under a debugger was at least one order of magnitude worse.

troopa81 commented 8 months ago

I've added a "Startup cost of provider plugins" section above

Thanks!

We could, but I don't think the extra complexity is worth it.

I don't see exactly why it would be that much complex compared with the solution you propose?

nyalldawson commented 8 months ago

@troopa81

I don't see exactly why it would be that much complex compared with the solution you propose?

Well we already have a mature framework for including providers in src/core and src/gui. There's no extra cmake setup required to move more providers to this system.

If we make another library there IS cmake changes required. (We'd also need two extra libraries, one for the non gui part of the providers and one for the gui components.) I just don't think the extra layers have a compelling enough justification.

troopa81 commented 8 months ago

We'd also need two extra libraries, one for the non gui part of the providers and one for the gui components.)

Why is that? we could have both in the same library.

The performance issue is mainly due to the dynamic loading (dlopen), why not keep the providers as shared libraries and dynamically link the application to them, thus avoiding dynamic loading (which look like to be done for postgres ?

m-kuhn commented 7 months ago

The current cmake code of plugins is painful indeed with building shared and static libraries of providers.

How we will make sure that it will still be possible to compile a "slim" version of qgis core. I think is important as it's more and more used as library (server, mobile, wasm, process). With the current approach provider API to provider functionality is clearly defined. I could well imagine that we will have direct access to some [name your provider] specific code in various places in QGIS core and the cmake flags will be rendered disfunctional. Are there any plans to test building with very few (e.g. only memory and maybe gdal/ogr) provider?

It's currently already possible to build providers with static linkage. This is referred to as "complex and untested" in the proposal. Has any consideration been given to make static the default / only supported approach. This would make providers immediately tested and increase runtime loading of qgis in the same way as with this proposal but keep the "natural barrier" for keeping providers self contained and modular. Or is there a complexity about this setup that we'd like to get rid of?

anitagraser commented 7 months ago

Thank you for submitting your proposal to the 2024 QGIS Grant Programme. The 2 week discussion period starts today. At the end of the discussion, the proposal author has to provide a 3-line pitch of their proposal for the voter information material. (For an example from last year check https://github.com/qgis/PSC/issues/58#issuecomment-1567892412)

nyalldawson commented 7 months ago

@m-kuhn If this proposal doesn't meet the QField use case to your satisfaction then I'll withdraw it -- improving things for QField was one of the main drivers here :rofl: :+1:

m-kuhn commented 7 months ago

I am in favor of seeing something happen :+1:

If building providers dynamic, newer XCode versions complain about the static and dynamic versions of the providers both depending on the same files (can't recall the exact wording right now) which makes things messy. I would not mind seeing providers as plugins disappear, I'm just not yet completely convinced that inflating core is the correct approach. What counts for me in the end is that we keep the possibility to build a "slick" core with only few providers (ideally we had the possibility to build just a renderer with memory layers but I guess that train has gone :laughing: )

nyalldawson commented 7 months ago

Ok, after giving more thought I've decided to withdraw this proposal and will rethink for next round.

@anitagraser can you remove this grant request from the list please?