qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
116 stars 37 forks source link

QEP 26 Support Of Virtual Layers #44

Closed NathanW2 closed 8 years ago

NathanW2 commented 8 years ago

QGIS Enhancement 26: Support Of Virtual Layers

Date 2014/09/30 Author Hugo Mercier Contact hugo dot mercier at oslandia dot com Last Edited 2015/11/10 Status Draft Version QGIS 2.14+ Sponsor MEDDE, France - "Ministry of sustainable development" Sponsor URL http**//www.developpement-durable.gouv.fr

Summary

This enhancement proposal is about the addition of "virtual layers" to the QGIS core.

A virtual layer is defined to be very close to what is called a "view" in the context of relational database management systems. It aims at offering a formatted view of pre-existing data originated from one or more layers, usually without copy of data involved. A view can also bring new data computed on existing data, but without the need to store them explicitly.

Even if it comes from the database world, this feature does not focus on any particular data provider. It is not restricted to data providers that connect to databases.

The list of data representation and manipulation features are growing in QGIS. Such features usually already exists in database engines and are reproduced in QGIS to circumvent the lack of these functionnalities for popular data formats (shapefiles, text-based, etc.). But supporting advanced database features by coding them into QGIS is probably not a good idea if one wants to keep a light and robust code base.

This is why we propose to embed a database engine. Once existing layers of a QGIS project can be exposed to this database engine, new interesting functionalities can be easily developed. It could also be used to refactor existing QGIS code that deals with database-oriented features.

For now the virtual layer is mainly seen for vector layers, but extension to raster layers could be considered in the future.

Possible use cases

The proposed design is based on these different use cases for vector layers:

Virtual layers are designed to be easily usable by the end user as well as by some more advanced users.

Proposed Solution

A first working prototype has been implemented as a C++ plugin. Design and implementation details in this document results from it.

SQLite provides a very useful feature designed to embed its database engine in a third party application: virtual table. It offers the ability to expose internal data as an SQLite table. Then any operations available on a regular table can also be applied to a virtual table. The implementation can then choose to apply or ignore some of the operations. A C API (as well as a through a Python library) allows to create such a virtual table mechanism.

It makes a perfect candidate for the implementation of virtual layers in QGIS:

This proposal is inspired by the VirtualOGR_ driver for Spatialite that allows to open any OGR-supported format as a virtual table.

.. _virtual table: http://www.sqlite.org/vtab.html .. _VirtualOGR: https://www.gaia-gis.it/fossil/libspatialite/wiki?name=VirtualOGR .. _Python API: https://github.com/rogerbinns/apsw

Core changes

A new QgsVectorDataProvider (QgsVirtualLayerProvider) will be developed to handle virtual layers. This makes the changes very isolated and will then impact minimally the current code base.

Provider syntax

A new virtual layer can be created out of existing layers, possibly with a query. Every parameters are passed to the QgsVectorLayer constructor "source" argument as a string representing an URL with a set of key-value pairs.

To reference a layer, use the key layer. The value is a string with three fields separated by a colon : provider:source[:name].

A "live" layer already loaded by QGIS can also be referenced by using the key layer_ref. The corresponding value is layer_id[:name].

The geometry key is used to set the geometry column name, type and SRID. Its syntax is column_name[:type:srid]. The type field can be a string (point, linestring, etc.) or a QGis::WkbType. SRID is an EPSG code.

A parameter allows to ignore any geometry column, resulting in an attribute-only layer : nogeometry

The uid key allows to specify which column must be used as an identifier for each feature. This is not mandatory. If no uid is specified, the underlying provider will autoincrement an integer for each feature.

The query key allows to use an SQL query to setup the layer. It should also be escaped. Layer references are not strictly necessary. If the query uses names of existing QGIS layers (or their ID), they will be automatically referenced. Name and type of the geometry column will also be detected.

Serialization

The definition of virtual layers can be written on disk and reopened later on (only if the definition does not include "live" layers).

When the source parameter contains a path and no other parameters, it is opened from disk.

When the source parameter contains a path and other parameters are set, the virtual layer is created as usual and saved into the given path.

SQL syntax

The supported SQL syntax is the SQL of SQLite. The version shipped with QGIS. No recursive CTE for now.

SQLite and Spatialite functions are supported.

The internal SQL parser tries to detect types of columns (and especially the geometry column). "CAST" and geometry casts expressions are supported.

Examples

Minimal syntax to set an SQL query out of an existing layer (Python syntax)

l = QgsVectorLayer( "?query=SELECT a,b FROM tab", "myvlayer", "virtual" )

A join between two shapefiles where the uid and the geometry column are set :

q = QUrl.toPercentEncoding("SELECT a.* FROM shp1 AS a, shp2 AS b WHERE Intersects(a.geometry,b.geometry)")
l = QgsVectorLayer( "?layer=ogr:/data/myshape.shp:shp1&layer=ogr:/data/myshape2.shp:shp2&query=%s&uid=id&geometry=geometry:2:4326" % q, "myjoin", "virtual" )

Minimal syntax to create a virtual layer on a single layer (equivalent to SELECT * FROM)

l = QgsVectorLayer( "?layer=ogr:/data/myfile.shp", "myvlayer", "virtual" )

Creation of a virtual layer and storage in a file :

l = QgsVectorLayer( "/myvlayer.sqlite?query=SELECT * FROM ta, tb", "myvlayer", "virtual" )

Opening of an existing virtual layer:

l = QgsVectorLayer( "/myvlayer.sqlite", "myvlayer", "virtual" )

Spatial index support

When the query uses a field from one of the referenced layer declared as a primary key, features are accessed via a QgsFeatureRequest with a "filterFid". It's then up to the underlying provider to support fast access by ID.

For spatial indexes, a 'hidden' field named '_searchframe' is created for each virtual table (i.e. each referenced layer of the virtual layer). The bounding box of the given geometry will be used to restrain the query to a particular region of space. This bounding box is passed to the underlying provider by a QgsFeatureRequest with a "filterRect".

The use of spatial indexes must then be explicit.

For example :

SELECT * FROM pt, poly WHERE pt._search_frame_ = poly.geometry AND Intersects(pt.geometry, poly.geometry)

GUI changes

The db manager SQL query window will be used if possible when creating a new virtual layer.

Python Bindings

Creation and use of virtual layers will be available through a new QGIS data provider. There is no particular additional Python bindings needed.

Test Coverage

Uni tests will be available. The current prototype implementation includes about 30 test cases.

Performance Implications

Virtual layers are first designed to add new functionalities, hard to obtain without. Speed is not the first priority.

Virtual layers add an intermediate processing. Values (and geometries) will be converted from source layers to the internal SQLite format and then again to QGIS format.

If performances appear to be a problem, a "bypassing" strategy (passing pointers to value rather than converting values) could for instance be investigating in the future.

Further Considerations/Improvements

(required)

mhugo commented 8 years ago

Alright, I've updated the description in the issue text. That would be nice to have feedback (and proceed to the vote) quite soon if possible. If everything goes well, the goal is to have it for the 2.14 release

mhugo commented 8 years ago

Hi. So ... any objections here ?

NathanW2 commented 8 years ago

+1 from me. I think it's a really needed feature that MapInfo has that we don't.

+1 Condition: It's done as a core feature and not a plugin

@nyalldawson @wonder-sk @m-kuhn thoughts?

wonder-sk commented 8 years ago

Looks good to me...

Questions:

Few notes:

mhugo commented 8 years ago

I do not understand what happens in serialization: what is serialized? Just the layer definition? Or the whole table is materialized to a db file?

Yeah, it was not very clear, sorry. I've updated the text. It's only the definition that is saved.

are the virtual layer "live" - i.e. if something is changed in the source layer, is the change propagated? (or is there some memory/disk cache that would prevent that?)

Yes, the change is propagated.

how does it compare to OGR's virtual layer driver in terms of functionality ?

It is more or less the same thing, but at the QGIS level. It allows to open any QGIS layers.

it would be useful to have a class in qgis_core library to construct/parse URIs for the provider

Ok

it would be good to mention the QEP implies implementation of SQL parser :)

There is a quick mention in the "SQL syntax" section. I am not sure a SQL parser is absolutely needed. But, this is the way I found to ease the creation of a virtual layer (it detects types of columns and tables referenced in the query)

NathanW2 commented 8 years ago

Are you happy with this @wonder-sk?

NathanW2 commented 8 years ago

If there are no objections I am happy to see this approved and moved into PR+Merge stage (even though I know you already have the code mostly done)

wonder-sk commented 8 years ago

Happy too!

mhugo commented 8 years ago

Great, thanks ! I still have to clean the code a bit and will send a PR.

nyalldawson commented 8 years ago

@NathanW2 should this be made final now that it's implemented and merged?