qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
118 stars 37 forks source link

Geometry Validation #131

Closed m-kuhn closed 4 years ago

m-kuhn commented 6 years ago

QGIS Enhancement: Geometry Validation

Date 2018/08/21

Author Matthias Kuhn (@m-kuhn)

Contact matthias@opengis.ch

maintainer @m-kuhn

Version QGIS 3.4

Sponsor AGI Kanton Solothurn, Switzerland

Summary

A basic requirement of many GIS environments is that users are required to always create valid geometries. We therefore propose a system that helps to avoid invalid geometries by

Proposed Solution

If geometry checks have been enabled on a layer, QGIS will

Automatic fixes (single feature)

This class of fixes does not require user intervention and is automatically applied. It is usually very cheap in performance overhead and will be applied to geometries when they enter the edit buffer.

Examples:

Single Feature Checks

This class of checks can run without any context (apart from some configuration parameters of the check) on a single geometry. They will usually require some user interaction to be resolved.

There is a thread running in the background with a job queue. Whenever a feature is added, a geometry is edited or a feature is deleted:

There is a list of unresolved geometry errors for the current edit session.

Whenever a job completes all errors for this feature are removed from the list

If the job reported some errors for this feature, add the errors to the list.

Examples:

Multi Feature Checks

This class of errors require more than one single feature to be completed. They are often topological checks. Due to their nature of requiring other features for a check, they will usually send a request for additional features to the data provider. This adds I/O overhead which prevents us from running these checks repeatedly immediately after each geometry change.

These checks are therefore performed on a separate queue, which will for now only be run on save, but other suitable times to run this queue might be added (after some time of inacivity, layer change, map tool change, ...).

Examples:

User Interface

Configuration

The vector layer properties are extended with a new geometry validation section. Within this section, individual checks and corrections for a layer can be activated.

Automatic fixes:

Single feature checks (possibly called "geometry checks"):

Multi feature checks (possibly called "topology checks"):

Error reporting

All error reporting will be done on a panel. The panel will show a summary for layers with errors. For each of these layers, a list of errors is shown, next to buttons to show the error feature and error position on the map and optional correction suggestions.

Map Tool warnings on the message bar

We propose to completely remove the reporting of geometry errors on the message bar while a geometry is being edited. These messages are often transient, hard to interpret and track. In case this is objected, it will be made configurable at least (preferably off by default).

Saving

On saving a layer,

Affected Files

qgsvectorlayer.h:

Some methods will have the QgsGeometry or QgsFeature parameter unconsted, to have it updated if an automatic fix has been applied and a new configuration object for these fixes is added.

struct GeometryOptions
    {
      /**
       * Automatically remove duplicate nodes on all geometries which are edited on this layer.
       *
       * \since QGIS 3.4
       */
      bool removeDuplicateNodes = false;

      /**
       * The precision in which geometries on this layer should be saved.
       * Geometries which are edited on this layer will be rounded to multiples of this value (snap to grid).
       * Set to 0.0 to disable.
       *
       * \since QGIS 3.4
       */
      double geometryPrecision = 0.0;
    };

    bool updateFeature( /* const */ QgsFeature &feature, bool skipDefaultValues = false );
    bool changeGeometry( QgsFeatureId fid, /* const */ QgsGeometry &geometry, bool skipDefaultValue = false );

    // setters and getters for removeDuplicateNodes
    // setters and getters for geometryPrecision

Performance Implications

Further Considerations/Improvements

But don't we already have tools that do that?

Yes, there are several tools available, but they all are either targeted on different use-cases or have limitations.

Processing algorithms

There is a processing algorithm to make a layer valid.

Caveats:

Geometry Checker Plugin

This plugin offers a wide range of functionalities to check and fix geometries iteratively. It can and will be used as a guideline, as well as the many functionalities which are available for it in the analysis library.

Caveats:

Realtime warnings from map tools

Some map tools perform real-time checks of the geometry which is being edited. Errors detected while editing are shown in a message bar as a hint to the user.

Caveats:

Processing algorithms

There is a processing algorithm to make a layer valid.

Caveats:

Geometry Checker Plugin

This plugin offers a wide range of functionalities to check and fix geometries iteratively. It can and will be used as a guideline, as well as the many functionalities which are available for it in the analysis library.

Caveats:

Realtime warnings from map tools

Some map tools perform real-time checks of the geometry which is being edited. Errors detected while editing are shown in a message bar as a hint to the user.

Caveats:

Backwards Compatibility

(required)

Issue Tracking ID(s)

(optional)

Votes

(required)

haubourg commented 6 years ago

Awesome ! a big +1 here with the proposal. The topology checking situation was indeed messy and not friendly to end users.

luipir commented 6 years ago

I know many users against respect "transparently fixing common mistakes without user intervention where possible", btw If the validation feature is anabled in a visible and clear way I would give a big +1.

m-kuhn commented 6 years ago

I know many users against respect "transparently fixing common mistakes without user intervention where possible"

It will be opt-in

image

gioman commented 6 years ago

@m-kuhn very nice to see this qep! thanks. I personally think (since long now, already said this in several occasions) that was really necessary something in this sense, especially to improve the clutter we have now. You cite the Geometry checker and the Processing tools (which frequently give completely different results for checks) but also don't forget the (core) Topology checker, that despite the name does also geometry checks but does not have any fixing capability (and despite being MUCH easier to use for common users compared to the Geometry checker).

nyalldawson commented 6 years ago

Better put the number 1 caveat of the geometry repair algorithm: it loses curves and m values.

nyalldawson commented 6 years ago

I'd like to see your plans for the current tools stated. Are you planning on making any of them redundant? Or will this be "yet another" disconnected geometry check tool in qgis?

nyalldawson commented 6 years ago

@gioman minor nitpick, but topology checker isn't "core", it's just a bundled plugin. The difference is that it's totally siloed and the code cannot be reused by plugins/exposed to processing/etc.

It's also totally unmaintained, but that's a different issue.

m-kuhn commented 6 years ago

I'd like to see your plans for the current tools stated. Are you planning on making any of them redundant? Or will this be "yet another" disconnected geometry check tool in qgis?

I plan to remove the message bar notifications (as stated in the text), but I plan to leave the other two.

m-kuhn commented 6 years ago

don't forget the (core) Topology checker, that despite the name does also geometry checks but does not have any fixing capability (and despite being MUCH easier to use for common users compared to the Geometry checker).

Thanks for the input, I wasn't aware. I'll have a look at that concerning UX

gioman commented 6 years ago

minor nitpick, but topology checker isn't "core", it's just a bundled plugin. The difference is that it's totally siloed and the code cannot be reused by plugins/exposed to processing/etc.

@nyalldawson thanks for clarifying. Unfortunately for users this "is" a core tool (as shown in plugin manager).

It's also totally unmaintained, but that's a different issue.

Then why not remove it (eventually trying to save the good ideas in it, as the easy to understand/set geometry check rules)?

nyalldawson commented 6 years ago

I plan to remove the message bar notifications (as stated in the text), but I plan to leave the other two.

Fair enough - I was really just being cheeky/playing devils advocate and hoping you'd say "I plan on integrating all of these, using common, well-tested code, exposing to the API, and exposing to Processing" :wink:

The processing algorithms have their usecases because they can be used for automating tasks.

Agreed. However, it'd be ideal to somehow expose the extra functionality which topology checker/geometry checker offers to Processing to allow for use in scripted workflows.

nyalldawson commented 6 years ago

@gioman

thanks for clarifying. Unfortunately for users this "is" a core tool (as shown in plugin manager). Then why not remove it (eventually trying to save the good ideas in it, as the easy to understand/set geometry check rules)?

To clarify my position - I think this tool covers a very important use case, and I'm not advocating for it's removal until we have a complete replacement available. While it's unmaintained, it's still working OK and I don't think we can safely drop it yet (same with the evis plugin).

I'm pointing out more the major disadvantages of its implementation as a plugin, as opposed to a proper core/analysis library API addition.

gioman commented 6 years ago

To clarify my position - I think this tool covers a very important use case, and I'm not advocating for it's removal until we have a complete replacement available. While it's unmaintained, it's still working OK and I don't think we can safely drop it yet (same with the evis plugin).

yes I agree and in fact I show it frequently at trainings (followed by cleaning/fixing with st_makevalid, that is effort less...). What bothers a lot is the fact that it yields frequently different results compared to the geometry checker or the Processing tools (it does not mean that is the Topology checker to be wrong, I'm just pointing that the users are puzzled by having at least 3 ways to check geometries, and frequently results are different on the same input.

nyalldawson commented 6 years ago

I'm just pointing that the users are puzzled by having at least 3 ways to check geometries, and frequently results are different on the same input.

Yep.

Anyway, I have total trust in @m-kuhn to be moving things toward a cleaner, reusable approach which will take us a step toward geometry-repairing nirvana!

cbertelli commented 6 years ago

What a big and promising issue you are tackling! Thank you very much. For sake of explaining only, I divide my argument in two parts, but I think they are interlinked. The first part is the "automatic" reaction. Using reprojection on the fly limits and/or makes more complex (or ineffective) one of the traditional means of control, the grid. In a recent experience, a work group working on a remote geodatabase had severe performance degradation by using automatic snap and the resulting layer was very difficult to fix. At the end the mismatching nodes were several hundreds, missing by less than 0.1 millimetres. Newbies are usually scared and unable to react to a message like "invalid geometry cannot be saved on database". Finding the superposed nodes is almost impossible without a graphical hint, manual fixing should be more easy. This for the things that should be easy and/or automatic. The second part is about geometry-repairing tools. Topology based GIS programs (mainly proprietary ones, but GRASS does as well) had a very strict and procedural approach to geometry cleaning, a true innovation was provided by JUMP with JTS followed by GEOS which is used in "almost" the same way in PostGIS and QGIS (double check with conflicts, as seen before). QGIS provides multiple tools to handle geometry during input, editing and cleaning activities, open means free but also possibly messy. The tasks to provide two unified and pluggable interfaces to input/editing and cleaning is a valuable effort, I suppose the best way is providing a richer API for that and refactoring tools/plugins to use it. The interface is a matter of choice. I think there should be a more recognisable "section" for cleaning/topology, but we have already a Vector menu and the Processing menu is mostly populated by tools acting on geometries and can be used for cleaning/topology. Sorry for making a mess out of a clear proposal, but this is so central in the user experience to decide if QGIS is "really useful for professional purposes" (whatever that means) or not.

m-kuhn commented 6 years ago

Thank you for the feedback @cbertelli

For the scope of this project, what we plan to do is to fix to single layers only. The considerations that you have concerning working with multiple layers on different CRS is indeed quite complicate but out of scope for this project here. The plan is to provide an API that any tool can plug into and offer the best™ approach to fix things, to allow implementing whatever smart ideas someone may come up with in the future.

Does this answer your concerns or is there a point which I am missing that is incompatible with the proposal in this QEP?

cbertelli commented 6 years ago

Sorry if I answer so late, but I didn't notice your reply. The issues I propose do not have a solution if we tackle them all together but the idea of facing one problem at a time and expecting them to be gradually digested is not effective in this case. Snapping on a single layer may be enhanced, but your analysis is so much better than my description that... calls for further discussion. I get in a worst trouble (and a conceptual one), but the current state of things is not really effective. The proposal is elegant, but comes short in providing a proper solution. Nor I do, I silently propose that many good things in QGIS (transparent file access – how to implement an efficient spatial index? – transparent reprojection – what about keeping precision in any coordinate system and snapping if they are only partially reversible?) fade in favour of... a consistent geometric result. Am I off topic?

m-kuhn commented 6 years ago

Can you define "a consistent geometric result"?

cbertelli commented 6 years ago

You are right. I take into account the set of layers of a map which are phisically connected in the real world (obviously it's not always the case). There are several use cases that are questionable under this profile – e. g. enriching information of a projected layer with a unprojected one. A less interesting case, if I snap to a node of a layer with a different CRS, I get coordinates with many decimals figures. This keeps conceptual consistency but degrades the layer with a misleading idea of precision ad an increasing footprint. The real problems come from inefficient indexing and this happens also with one layer. When working in a distributed environment, you may end up with thousands of crossroads that mismatch by 0.0037 (or something of the like) and the users complain for all the lost time waiting for a "late snap" the solution may be "let's change the game with a 1 meter grid" (and you feel in the '80...). This is definitely not what you are asking, but something I hope explains what I mean.

On Thu, Sep 20, 2018 at 7:31 AM, Matthias Kuhn notifications@github.com wrote:

Can you define "a consistent geometric result"?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qgis/QGIS-Enhancement-Proposals/issues/131#issuecomment-423045063, or mute the thread https://github.com/notifications/unsubscribe-auth/AIobxB0vAmKFp6rthB-2VNpH48WfXg2bks5ucyghgaJpZM4WFs7J .

--

Carlo A. Bertelli Charta servizi e sistemi per il territorio e la storia ambientale srl Dipendenze del palazzo Doria, vc. alla Chiesa della Maddalena 9/2 16124 Genova (Italy) tel./fax +39(0)10 2475439 +39 0108566195 mobile:+39 393 1590711 e-mail: bertelli@chartasrl.eu http://www.chartasrl.eu

m-kuhn commented 6 years ago

If you are working with the combination of topology and have layers with in different CRS/projections you certainly do not have an easy life. The question is: can we make this magically work fine for everyone in every use case? My guess is no, unless you come up with a great proposal. What we do normally and what works reliably is to reproject everything into a single system. Work is then done in this system and exported from there. This allows to have full control over what happens when.

Apart from this you will be able to implement your own checks, optimized for your system.

The real problems come from inefficient indexing and this happens also with one layer

I don't understand how indexing is related to this.

cbertelli commented 6 years ago

The real problems come from inefficient indexing and this happens also with one layer

I don't understand how indexing is related to this.

That comes from real use cases. Remotely edited PostGIS layers suffer from network lags and these are worsened by using references from official cartography (projected). Indexing adds delays that interfere with snapping and you end up with errors even on one layer.

m-kuhn commented 6 years ago

This kind of snapping is not affected by this proposal. Snap to grid does not require indexing.

cbertelli commented 6 years ago

This kind of snapping is not affected by this proposal. Snap to grid does not require indexing.

Yes, snapping to grid is a possible solution to this kind of problems, but the bad results on (very slow) networks happen before applying a grid. Anyway going to this detail is detrimental to the proposal which is very welcome. My criticism deals with complex use cases, I only wanted to say this only tackles a part of the issues (which is obvious for any proposal).

m-kuhn commented 6 years ago

Yes, snapping to grid is a possible solution to this kind of problems, but the bad results on (very slow) networks happen before applying a grid.

Snap to grid works independently from snap to objects (which requires indexing). Snap to objects can be disabled completely and snap to grid still works. Please open an issue if you find performance issues with this combination.

Anyway going to this detail is detrimental to the proposal which is very welcome. My criticism deals with complex use cases, I only wanted to say this only tackles a part of the issues (which is obvious for any proposal).

Thanks a lot, I am looking forward to have you test the implementation and see where we can go even further!