troopa81 commented 1 year ago

Build snapping index at render time

Date 2023/08/16

Author Julien Cabieces (@troopa81)

Contact julien dot cabieces at oslandia dot com

maintainer @troopa81

Version QGIS 3.34

Funded by Métropôle de Lille

Summary

QGIS proposes an option to snap on current geometries to ease digitizing. In order to do that efficiently, it builds a snapping index which allows to get quickly the feature/geometry beneath the current mouse position.

At the moment, this snapping index is built when required and is updated only when data has changed. This has several flaws:

If the data is updated on backend side (because of a trigger in PostGres for instance), your index is not up to date and it leads to ghost snapping item (see issues #31251 and #40720 ). It exists a workaround with data dependencies (see this blog post for more informations)
We read data from the provider once more just to build the index which means more SQL requests and more CPU use
We don't know how much memory the index gonna take for a specific layer and we certainly don't want a too big index. Indeed, it makes no sense to build an index of billions of point gathered on a small screen because user would not be able to actually select them separately. So far, we relies on heuristics to know the atual best area to index and sometimes need to recreate an index with a smaller area if there is still too much to index. Not to mention that if you then move your cursor out of the smaller indexed area, an index build is restarted...

This last is called Hybrid strategy. It's possible to change it to Extent strategy using Python API and that's what I suggest when people want to snap on huge amount of data. It is to be noted that it has already been decided to not expose the index strategy choice to the user for I think, perfectly good reasons.

Proposed solution

I propose to keep track of the rendered features and their geometries at rendering time and later build the index using those features in a separate thread (builing snapping index is already done in a separated thread).

The rendered features should be collected using a specific QgsRenderedItemDetails

class CORE_EXPORT QgsRenderedFeaturesItemDetails : public QgsRenderedItemDetails
{
  public:

    struct RenderedFeature
    {
      /**
        * Constructor
        * \param featureId feature id
        * \param geom geometry in layer's CRS
        */
      RenderedFeature( QgsFeatureId featureId, QgsGeometry geom )
        : fid( featureId )
        , geometry( geom )
      {}

      QgsFeatureId fid;
      QgsGeometry geometry;
    };

    typedef QList<RenderedFeature> RenderedFeatures;

    /**
     * Constructor for QgsRenderedFeaturesItemDetails.
     */
    QgsRenderedFeaturesItemDetails( const QString &layerId, RenderedFeatures renderedFeatures );

    RenderedFeatures renderedFeatures() const;

  private:

    RenderedFeatures mRenderedFeatures;

};

This object should be created inside QgsVectorLayerRenderer drawRenderer and drawRendererLevels methods and stored using appendRenderedItemDetails. It would be then collected at the end of the rendering job like any other item details.

QgsMapCanvas would then dispatch the rendered features to each QgsPointLocator associated to rendered layers. several methods would be added to QgsPointLocator accordingly.


class QgsPointLocator
{

    ...

    /**
     * Set rendered features to know which features point locator has to index
     * \see setUseRenderedFeatures useRenderedFeatures
     */
    void setRenderedFeatures( QgsRenderedFeaturesItemDetails::RenderedFeatures renderedFeatures );
}

Finally, this last would use the rendered features to build the current index when required.

In order to enable the collect of rendered feature or not, a flag CollectRenderedFeatures would be added in Qgis::RenderContextFlag and the following method in QgsMapRendererJob.


QgsMapRendererJob
{

   ....

    /**
     * Set \a layers for which we want to collect rendered features
     */
    void setRenderedFeaturesLayers( const QList<QgsVectorLayer*> & layers );

    /**
     * Returns layers for which we want to collect rendered features
     */
    QList<QgsVectorLayer*> renderedFeaturesLayers();
}

Those methods would be called directly from map canvas regarding snapping utils parameters.

Features would be collected only when ALL the following conditions are met:

rendering the main map canvas
snapping is enabled
at least one layer is in edit mode

If user needs snapping but rendering is already finished (he clicks on the enable snapping button for instance), then QgsPointLocator would fallback on the existing behavior (render the layer and call willRenderFeature).

Strategy

A new strategy IndexRenderedFeatures would be added in QgsSnappingUtils in order to prepare index accordingly. It would become the default strategy in QgsMapCanvasSnappingUtils.

I would be in favor of deprecate the IndexHybrid strategy because it seems hasardeous as you never really know what exactly features set you are really indexing. Any client code should either index everything in a specified extent or index the rendered features.

Limiting index size

Like I said in introduction, there is no point in indexing too much data. To avoid that, the solution is to enable minimum and maximum scale for snapping.

The maximum snapping index size should be made configurable using an application setting: Snapping index maximum size per layer (in Mb).

When collecting feature in QgsVectorLayerRenderer, the index size should be computed using the QgsGeometry::size(), and if it exceeds the setting value, collecting feature will stop, only the first features would remains, and a message would be printed to the user

Layer XXX has too many features for snapping, please configure snapping minimum and maximum scale level from project snapping settings dialog or change the snapping index maximum size per layer setting in options.

This message would be displayed only once for every run of the application so user can continue his work in downgraded mode.

Why not using `QgsRenderedFeatureHandlerInterface` ?

At some point, I considered using QgsRenderedFeatureHandlerInterface to collect rendered features, as it was mentioned in this comment.

But I decided not to, because:

handlers are shared between layer render job, so it would be necessary to protect with a mutex the rendered features collect, having IMHO a big impact on performances.
we want to retrieve sometimes not rendered features
There is some additionnal computation (bbox for instance) involved in using this interface that would possibly impact rendering performance and that are not usefull in this particular case.

Affected Files

Modified:

QgsPointLocator
QgsSnappingUtils
QgsMapCanvasSnappingUtils
QgsMapCanvas
QgsMapRendererJob
QgsVectorLayerRenderer
qgis.h

New:

QgsRenderedFeaturesItemDetails

Performance Implications

I developed a Proof of Concept to test things and analyse performance with callgrind.

According to my tests with a PostGIS dataset containing approximatively 3K polygons coming from Openstreetmap, 70% of index building is spent in reading feature from database. So, the index building performance should be improved with the same order of magnitude.

On the other side, time spent to collect feature is less that 1% than the all rendering process.

This is what I get approximatively for both actual and new index strategy.

	rendering time	index build time
IndexHybrid	~210 ms	~50 ms
IndexRendererFeatures	~213 ms	~5 ms

Further Considerations/Improvements

After these modifications, index building would be about only 5ms, parallelisation of indexing per layer seems a bit overkill and may be counterproductive. It would be certainly better to just build all the needed index in one thread (or maybe 2/3 if there is a lot of layer to index) than do it per layer. But it needs more rework and that's out of the scope of this QEP.

Issue Tracking ID(s)

Votes

(required)

nyalldawson commented 1 year ago

FWIW, this is how we used to do things, and it was changed sometime in QGIS 2.x by @wonder-sk for good reasons (which quite possibly no longer apply).

So ping @wonder-sk for insights here :smile:

In order to enable the collect of rendered feature or not (when rendering a layout for instance), a flag CollectRenderedFeatures would be added in Qgis::RenderContextFlag and the following method in QgsMapRendererJob.

This should also be disabled for canvas renders whenever a user isn't actively snapping

Regarding

QgsGeometry geometry;

in RenderedFeature -- is this the original feature geometry in the layer's CRS? Transformation to map crs happens later in the process (after we've already eg extracted points from linestrings), so I'm guessing yes. It'd be worth noting this in the documentation too.

for QgsMapCanvas:

void setUseRenderedFeatures( bool useRenderedFeatures ); bool useRenderedFeatures() const;

I don't think these should be added to QgsMapCanvas, but rather to the QgsSnappingUtils attached to the canvas. QgsMapCanvas API is already quite large.

nyalldawson commented 1 year ago

See https://github.com/qgis/QGIS-Enhancement-Proposals/pull/16/files

MorriganR commented 1 year ago

It would be very interesting to have a different strategy for creating a snapping index. But I'm not sure that it should be made the default strategy.

The new IndexRendererFeatures strategy is "similar" to IndexExtent, so why does the "Performance Implications" section compare it to IndexHybrid and not IndexExtent?

As for problems #31251 and #40720, they are quite simply solved by a Python script like this: https://github.com/MorriganR/index_updater/blob/master/README.md

@nyalldawson

In order to enable the collect of rendered feature or not (when rendering a layout for instance), a flag CollectRenderedFeatures >>would be added in Qgis::RenderContextFlag and the following method in QgsMapRendererJob.

This should also be disabled for canvas renders whenever a user isn't actively snapping

and the layer is not in editing mode :)

troopa81 commented 1 year ago

Thanks for your reviews/comments @nyalldawson @MorriganR

This should also be disabled for canvas renders whenever a user isn't actively snapping

I changed the API described in the QEP so if rendering has happened and collected features then there is no need to re-read them from the provider, if not (because we just switch on snapping/editing for instance) we fallback on the existing behavior (render the layer and call willRenderFeature).

in RenderedFeature -- is this the original feature geometry in the layer's CRS? Transformation to map crs happens later in the process (after we've already eg extracted points from linestrings), so I'm guessing yes. It'd be worth noting this in the documentation too.

Right, it's in the layer's CRS. So it need to be transformed while building index. I updated documentation.

I don't think these should be added to QgsMapCanvas, but rather to the QgsSnappingUtils attached to the canvas. QgsMapCanvas API is already quite large.

Those methods are supposedly added to QgsPointLocator API, not QgsMapCanvas. I finaly removed setUseRenderedFeatures/useRenderedFeatures to trigger use of rendered features as soon as you call setRenderedFeatures.

troopa81 commented 1 year ago

As for problems #31251 and #40720, they are quite simply solved by a Python script like this: https://github.com/MorriganR/index_updater/blob/master/README.md

I very well aware of this workaround and like I described in the QEP, that's what I advised users to do BUT I really do think that this "hack" should not be necessary and that it should work out of the box.

It would be very interesting to have a different strategy for creating a snapping index. But I'm not sure that it should be made the default strategy.

I think that IndexHybrid should not be the default strategy for the reasons described in the QEP. And as you mentioned, you propose to change it to IndexExtent using python API.

That's basically what this QEP proposes:

a new strategy very much similar to IndexExtent strategy
avoid re-reading data from provider if already done at rendering
limit index size

The new IndexRendererFeatures strategy is "similar" to IndexExtent, so why does the "Performance Implications" section compare it to IndexHybrid and not IndexExtent?

Because IndexHybrid is the current strategy and the only one user can select from the UI. But regarding the dataset I use the numbers are the same between IndexExtent and IndexHybrid.

and the layer is not in editing mode :)

Good point :+1: I updated the QEP

wonder-sk commented 1 year ago

My random thoughts on combining rendering with build of snapping index: when building QgsPointLocator / QgsSnappingUtils, it seemed more flexible to have snapping index creation separated from rendering:

we could have different rendering strategies (e.g. tiled rendering) instead of rendering the whole canvas at once. If we tie building of snapping to rendering, it will be hard to move to a different strategy
snapping has different requirements for feature requests - for example, feature requests for rendering may simplify geometry before it is returned from getFeatures() (but snapping will want unaltered features), or filters from renderers can be applied to feature requests (but snapping may require no filter). Not sure about curved geometries, don't they get segmentized?
it seemed a bit wasteful to rebuild the snapping index on every re-render - especially when many times the index is small enough to build once. Wouldn't it be better to have build of snapping index enabled just for the layers where a lot of editing is going on? And even then, do refresh of the snapping index based on notifications from the database, or some time-based interval?

As for practical bits of the QEP - I wonder if QgsRenderedFeatureHandlerInterface couldn't be somehow updated to fit this purpose rather than adding extra APIs for this - given that QgsRenderedFeatureHandlerInterface was created with this kind of use case in mind?

troopa81 commented 1 year ago

we could have different rendering strategies (e.g. tiled rendering) instead of rendering the whole canvas at once. If we tie building of snapping to rendering, it will be hard to move to a different strategy

Could we have different strategy regarding the layer type? In my proposal, If QgsPointLocator has no rendered features set, then it fallback on the actual behavoir (using a renderer and the willRenderFeature method)

snapping has different requirements for feature requests - for example, feature requests for rendering may simplify geometry before it is returned from getFeatures() (but snapping will want unaltered features), or filters from renderers can be applied to feature requests (but snapping may require no filter). Not sure about curved geometries, don't they get segmentized?

That's indeed a very good point... which I didn't anticipated :grimacing:

it seemed a bit wasteful to rebuild the snapping index on every re-render - especially when many times the index is small enough to build once. Wouldn't it be better to have build of snapping index enabled just for the layers where a lot of editing is going on? And even then, do refresh of the snapping index based on notifications from the database, or some time-based interval?

The initial goal of this QEP was to remove IndexHybrid as the default strategy for indexing because no matter if user define a minimum and maximum scale, QGIS is indexing a reasonnable area which is either greater or smaller than the map canvas extent, leading to high CPU usage for indexing and slowness.

Could we just:

Make IndexExtent the default strategy
limit the index size as described above with a setting

MorriganR commented 1 year ago

@wonder-sk

it seemed a bit wasteful to rebuild the snapping index on every re-render - especially when many times the index is small enough to build once. Wouldn't it be better to have build of snapping index enabled just for the layers where a lot of editing is going on? And even then, do refresh of the snapping index based on notifications from the database, or some time-based interval?

From my experience, I can say that the IndexHybrid strategy does not work for non small layers in edit mode. For example, when rebuilding an index, queries are executed that get the entire table, even if the current extent covers no more than a dozen layer objects.

SQL debug

![Capture](https://github.com/qgis/QGIS-Enhancement-Proposals/assets/31573086/43eb750a-f28d-457a-a8a2-cede55942315)

and it doesn't matter what triggered the index rebuild, a notification from the database, or a timer. Since the index rebuild isn't done in the main thread, it doesn't cause the UI to freeze, but it's actually more annoying because you can't edit the layer for a while...

Perhaps a compromise would be to use the IndexExtent strategy for layers in edit mode.

But I'm not sure that will help solve the problems that the Funder is willing to pay money for. And it might be worth making IndexExtent the default strategy.

roya0045 commented 1 year ago

I'm not sure if I missed a part, but an important issue with indexing are with web layers. The indexing operation can freeze QGIS for extended period of time as the index is built for the entire layer whereas only the visible part is needed. The index could be overwritten or consolidated with a view change and a timeout (sometimes you need to pan out to move elsewhere but indexing the whole extent of the pan might be excessive.

troopa81 commented 1 year ago

Hi @wonder-sk , what do you think of my proposal :

Make IndexExtent the default strategy
limit the index size as described above with a setting

qgis / QGIS-Enhancement-Proposals