qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
119 stars 37 forks source link

Support for mixed-geometry-type layers #298

Open 02JanDal opened 4 months ago

02JanDal commented 4 months ago

QGIS Enhancement: Support for mixed-geometry-type layers

Date 2024/06/27

Author Jan Dalheimer (@02JanDal)

Contact jan.dalheimer at sweco dot se

Version QGIS 3

Summary

Background

In the GIS world we have traditionally had a very clear split of one layer per geometry type, even if the layers otherwise are identical (same fields etc.). This however has a few drawbacks, in the case of QGIS this manifests in a few ways:

Though slowly, modern geospatial technology is moving away from this restriction; GeoPackage, GeoJSON, OGC API Features, PostGIS and many other formats, protocols and databases allow for mixed geometry types per layer/table/collection.

Allowing for mixed geometry types gives a lot of additional flexibility in data modelling, a common example I like to use is shrubs. From a data modelling perspective it can make a lot of sense to have a single layer (with fields such as species, spread (how far from the center does it spread, as seen from above), etc.) and then geometries such as points (for single bushes), lines (for linear hedges) and polygons (for larger shrubs).

Another practical example, which triggered this need/proposal, are comprehensive plans (sv. översiktsplan, example), for which work is ongoing to produce a national specification in Sweden, as well as a national API to retrieve such plans from all the municipalities. The specification (and, consequently, the API) has a lot of mixed geometry classes to allow for flexibility in the planning.

Apart from layers in QGIS currently being locked to a single geometry type, there is also no consistent handling of geometry type selection; some data providers (like PostGIS) allow you to select which geometry type to use, others show a dialog allowing you to add a layer each (like GeoPackage) and further others just pick the first geometry type the find and ignore the rest (like OAPIF).

Proposed Solution

Introduce "mixed" as a geometry type for vector layers and streamline how geometry type selection is handled.

Initially, within the scope of this QEP, mixed geometry layers will be read-only. If you want to edit the data you'll still need to add the layers separately. This is mostly because editing will introduce a few more considerations (such as how to select which geometry type to draw), however that should of course be added eventually.

One semi-open question is how to handle GeometryCollection (i.e. were there are mixed geometry types within a single feature). Ideally, they'd be rendered with each respective geometry type symbology, and selecting a part of it selects all parts (similar to how multi-geometries work). Implementing this might have to wait for later though.

Deliverables

Roughly split by planned PR:

While pending it currently seems likely that we'll be able to get funding to implement the first three points from two Swedish governmental agencies.

Example(s)

Affected Files

TBD

Risks

This change is a large step away from a traditional and somewhat deeply rooted assumption about GIS. There might be plugins and other integrations that assume a single geometry type per layer that will cease to work with this change.

Performance Implications

Performance should be better or equal to having separate layers per geometry type.

Further Considerations/Improvements

Backwards Compatibility

This will introduce a new geometry type (other code might need to be adapted to handle this case) as well as move where overridden geometry types are stored in the project (requires a minor migration on project load, though it could also be implemented to read from old and new location).

See Risks.

Issue Tracking ID(s)

rouault commented 4 months ago
  • Qgis.GeometryType and possibly Qgis.WkbType gain a Mixed member

Why not just using Qgis::WkbType::Unknown? For most, if not all, datasources that accept mixed geometry types, you know that you get a mix of geometry types only after doing a full scan of the layer (think of a multi gigabyte WFS layer when the server), which isn't practical in the general case. So effectively for those layers the geometry type is unknown at opening time. At least that's how this is modeled on the GDAL side with the wkbUnknown geometry type which means "unknown, potentially mixed". PostGIS base GEOMETRY type has similar semantics.

The geometry override will be moved from the data providers that implement it themselves into the base class

What do you mean by "geometry override"?

This change is a large step away from a traditional and somewhat deeply rooted assumption about GIS.

Indeed. This has significant breaking potential, and I don't really think this can be done in an incremental backwards compatible way in the 3.x series, or allowing the mixed geometry mode should be opt-in in all places. That sounds very much as a 4.0 topic, at least to allow the mixed geometry mode to be the default behavior.

While we are discussing significant changes, a closely related topic is the management of several geometry fields per layer, where currently providers need to expose one layer per geometry field. But admittedly that's a less frequent use case

nyalldawson commented 4 months ago

I'm curious how we'd alter the gui for eg categorised renderers to accommodate multiple symbol types. Did you have ideas here?

02JanDal commented 4 months ago

Why not just using Qgis::WkbType::Unknown?

Though it would, in a way, make linguistical semantically sense, that's not purely how it is used in the code today so it would change the semantics a lot of other code expects. A quick look through and you'll find places that use it to mean "mixed geometry", "not yet determined geometry", "no geometry" and "invalid geometry", likely more if one starts to look.

Now, for QGIS Core the best thing would of course be to take the opportunity to clean this up and introduce further enumeration literals for those other meanings, but that would further reduce the backwards compatibility with existing code (because a changed meaning is much harder to catch and can go unnoticed until the code breaks somewhere else, making debugging harder), even more than introducing a single new literal.

What do you mean by "geometry override"?

For example mRequestedGeomType in https://github.com/qgis/QGIS/blob/b578f1ea82ba9bd091198c1673583dadf21aa495/src/providers/postgres/qgspostgresprovider.cpp#L872

(it "overrides" the geometry as detected from the data source)

This has significant breaking potential, and I don't really think this can be done in an incremental backwards compatible way in the 3.x series, or allowing the mixed geometry mode should be opt-in in all places. That sounds very much as a 4.0 topic, at least to allow the mixed geometry mode to be the default behavior.

I haven't really been active enough in the Core parts of QGIS to be able to determine this, is there some sort of policy or similar on which changes require a new major version?

QGIS doesn't guarantee backwards compatibility between minor versions AFAIK, so my personal thinking is that as long as the change does not require major changes to external plugins etc. it should be fine? (an example of something I'd consider warranting a major version bump is changing the primary builds to Qt 6, or a complete rewrite of some major subsystem) Otherwise (in my experience from other projects) it's easy to get stuck in a loop of "these things all need to wait until the next major version, but they never get done so we can never release the next major version, and no-one wants to wait with their thing for the following major version because that'll take further years". (then again, I haven't really gotten enough insight into QGIS specifically to really have an informed opinion, so any previous discussions, plans or policys you can link me to would be greatly appreciated!)

In general though, I think this change should be doable with only minor disruption to external dependents; for example processing algorithms from plugins would simple not allow selecting a mixed layer if they have a filter on geometry type.

While we are discussing significant changes, a closely related topic is the management of several geometry fields per layer, where currently providers need to expose one layer per geometry field. But admittedly that's a less frequent use case

That's indeed also something I have considered in this context, and would really like to see added to QGIS at some point (and, like this, also challenges the traditional GIS-way of thinking while beginning to gain traction in protocols and formats). I think it's mostly orthogonal to this proposal though, so shouldn't affect or be affected to much.

I'm curious how we'd alter the gui for eg categorised renderers to accommodate multiple symbol types. Did you have ideas here?

A good question indeed!

Really I can think of two major approaches how to structure the symbology pane:

  1. A tab per geometry type, each tab containing a symbology editor widget for the geometry type
  2. Adjustments to the existing symbology widgets to integrate multi-geometry-type handling (for example in the case of categorized rendering you'd have the table of categories and for each row you'd be able to select a symbol per geometry type)

My plan is to, within the scope of this proposal and what I'll likely be able to get funding for, go for option 1, as it is significantly easier to implement ("just" a tab widget containing a QgsSymbolSelectorWidget, with some minor changes to QgsSymbolSelectorWidget and related to be able to choose a specific geometry type rather than just take it from the layer).

However, more long term, I think option 2 would be a lot more appreciated by users; in most cases I can imagine one would like to have the same setup (classification categories, etc.) for each geometry type, and likely very similar symbols (same colors etc.). But I think that that would require more work to QgsSymbolSelectorWidget and related widgets than I can justify at the time being. (but if the consensus here is option 2 or nothing I'll of course take it for another spin with the customer)

nyalldawson commented 4 months ago

@02JanDal

I haven't really been active enough in the Core parts of QGIS to be able to determine this, is there some sort of policy or similar on which changes require a new major version?

Honestly, this makes me a little nervous. What you're describing is a very major change to qgis, and a change which will touch classes all over the codebase (data providers, gui, analysis tools, the vector rendering loop, map legends, layer tree, etc). I'm not convinced this is the kind of project which is suitable for a relative newcomer to core qgis development.

For a point of reference, I'd be estimating that doing this change properly would take someone very familiar with the qgis codebase at least 3-5 weeks full time development. For someone learning the code base, you'd want to allow much longer, and also factor in a lot of time for the follow up fixes which will be needed over the next ~12 months as the changes get widespread testing.

I just want you to be fully aware of the magnitude of the changes and the amount of work this will require before committing yourself to undertaking it. 👍

andreasneumann commented 4 months ago

Should we perhaps consider this for QGIS 4.x - where API breaks are also allowed? Like @nyalldawson I also think that this would have tons of side-effects ....

I think it would be really great to have these mixed-geometry-type layers. Something I wanted to have for quite some time. But it needs to involve careful planning and is probably like a "Marathon". A lot of follow-up fixes will have to be done.

NathanW2 commented 4 months ago

There is also a lot of possible plugin breaks and hidden behaviour changes that would crop up with this kind of project

On Fri, 28 June 2024, 5:28 pm Andreas Neumann, @.***> wrote:

Should we perhaps consider this for QGIS 4.x - where API breaks are also allowed? Like @nyalldawson https://github.com/nyalldawson I also think that this would have tons of side-effects ....

I think it would be really great to have these mixed-geometry-type layers. Something I wanted to have for quite some time. But it needs to involve careful planning and is probably like a "Marathon". A lot of follow-up fixes will have to be done.

— Reply to this email directly, view it on GitHub https://github.com/qgis/QGIS-Enhancement-Proposals/issues/298#issuecomment-2196310240, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAC5FXCSJZGIYGFPQ7LF2GTZJUGBDAVCNFSM6AAAAABJ7YAE3GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJWGMYTAMRUGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

haubourg commented 4 months ago

Old remembrances of MapInfo untyped geometries , with oracle untyped geometries reminds me of how my information was highly chaotic and hard to use. To me this sounds like more like a regression than an enhancement, because I lived with the pain of untyped geometries years ago.

Not typing geometry types might imply very bad performances for medium to large datasets. We've had infinite fixes of the oracle provider with such lack of constraints.

I also share how profoundly this layer type is in all concepts of QGIS. I'm really afraid of the impacts on the GUI. So the balance between the current glitches and the amount of work implied makes me nervous like Nyall. Maybe I start to get old and grew up so much with QGIS that I miss the point of this change and I probably need a small proof of concept to convince me :)

02JanDal commented 4 months ago

Honestly, this makes me a little nervous. What you're describing is a very major change to qgis, and a change which will touch classes all over the codebase (data providers, gui, analysis tools, the vector rendering loop, map legends, layer tree, etc). I'm not convinced this is the kind of project which is suitable for a relative newcomer to core qgis development.

Sure, it's a pretty big task, but I'm willing to give it a go. I'm also not completely new to this scene; I have a decent amount of years of experience with Qt/C++, and have done my fair share of QGIS plugin work. It's also not my first foray into QGIS Core, though my previous work has been minor adjustments for internal use. But I'm not quite sure how this all is relevant for this discussion, though I'd be happy to discuss how you can help newcomers like myself get more active in another forum.

I'm still figuring out the details of the extent of this change (that's, as I understand, one of the reasons for QEPs) to be able to refine my estimate in the discussion with the client.

Should we perhaps consider this for QGIS 4.x - where API breaks are also allowed?

See previous response with some questions from me.

There is also a lot of possible plugin breaks and hidden behavior changes that would crop up with this kind of project

I guess it would make sense to do some research on which plugins would be affected by this change (starting by checking those that use the geometryType and wkbType methods of QgsDataProvider and QgsVectorLayer). Would that help alleviate some fears?

Old remembrances of MapInfo untyped geometries , with oracle untyped geometries reminds me of how my information was highly chaotic and hard to use. To me this sounds like more like a regression than an enhancement, because I lived with the pain of untyped geometries years ago.

Well yes, it's quite easy to come up with scenarios where mixed geometry layers are a really bad idea, yet the same is the case for any number of other things. That doesn't mean there are some other really good reasons to have something.

nyalldawson commented 4 months ago

@02JanDal

I guess it would make sense to do some research on which plugins would be affected by this change (starting by checking those that use the geometryType and wkbType methods of QgsDataProvider and QgsVectorLayer). Would that help alleviate some fears?

The problem isn't that the impact of this is unknown. It's that it's already known, and is very extensive! :smiley:

I haven't really been active enough in the Core parts of QGIS to be able to determine this, is there some sort of policy or similar on which changes require a new major version?

(Again, my suggestion would be to get started with core QGIS development on much smaller projects until you ARE familiar with all the processes and policies before undertaking a huge task like this, but let's ignore that recommendation for now).

QGIS policy is that no API breaks are permitted during any major cycle. A plugin or user script written for QGIS 3.0 MUST still run on QGIS 3.40 and later. This is a hard requirement of all QGIS development work, and it's designed to give stability for plugin authors and enterprise deployments of QGIS. The responsibility and burden of work lies with the developer modifying QGIS to ensure that they don't break API, and they have to do the (often extensive) extra work required to ensure that existing APIs do not change.

That's what makes this particular task even harder. You'll be doing open-heart surgery on QGIS, replacing very low level, vital components while still having to keep all the public facing APIs unchanged. :scream_cat:

QGIS 4.0 is not on the horizon yet. Even the transition to Qt 6 has been done carefully to avoid any API breakage with QGIS 3.0 plugins and scripts.

Why not just using Qgis::WkbType::Unknown?

Though it would, in a way, make linguistical semantically sense, that's not purely how it is used in the code today so it would change the semantics a lot of other code expects. A quick look through and you'll find places that use it to mean "mixed geometry", "not yet determined geometry", "no geometry" and "invalid geometry", likely more if one starts to look.

@rouault's suggestion is the correct approach here. We shouldn't just be adding additional WKB types to make a change like this simpler, as WKB IS a standard. (You'd actually create a lot more work for yourself by adding a new WKB type, rather then using the existing unknown type which is already handled by all the existing code paths)

A tab per geometry type, each tab containing a symbology editor widget for the geometry type

This sounds somewhat messy to me, and would result in a quite cramped UI eg when styling through the layer styling dock. The UX concerns need to be thought through carefully to ensure we're not just throwing extra widgets (and user complexity) into QGIS . Which brings me to my advice:

Having undertaken hundreds of massive changes like to QGIS over the past decade, I'd like to suggest what I consider to be the most likely path to success for this project. (I'm not trying to "big note" myself here, rather I'm hoping to share my experience so that you don't get stuck in a situation with your customer which could potentially end up financially disastrous for you!).

Hope that helps, and it's heard it in the collaborative/helpful spirit it's intended to be given in :+1:

02JanDal commented 4 months ago

Again, my suggestion would be to get started with core QGIS development on much smaller projects until you ARE familiar with all the processes and policies before undertaking a huge task like this, but let's ignore that recommendation for now

I also have some other, smaller, stuff I'll hopefully find the time to work on before I get to do any work on this. Though I also intended to break this one down into smaller pieces (Deliverables in original text) rather than tackle the full extent at once (see below).

I've for a long time wanted to get more into QGIS Core development, but through a combination of it being hard to find paid work, most things of personal interest being even larger in scope than this, and free time that is already taken up by several other projects it hasn't happened yet, though I hope to be able to break through that barrier now.

QGIS policy is that no API breaks are permitted during any major cycle.

Thank you, this clarifies it! I think it would be useful (for people like me) if this would be documented at https://docs.qgis.org/3.34/en/docs/developers_guide/index.html or in a CONTRIBUTING.md. I'd be happy to contribute it to the docs, though I think it would make more sense for someone who knows more of the details (for example if this only applies to the Python interface or also C++, if it is only API-compatibility or also ABI-compatibility, etc.) to do so, I'll see about opening an issue on the docs repo.

This sounds somewhat messy to me, and would result in a quite cramped UI eg when styling through the layer styling dock

Yeah, as I noted I don't think it's an ideal solution either, just what I could come up with within the scope of funding.

Break this project up into smaller, more manageable chunks

Based on your input, here is a revised plan (partly so that I can rephrase it in my own words to make sure I've understood your suggestions):

  1. Essentially your second bullet point. Introduce some way for the data provider and layer to tell exactly which geometry types it (might) contain. For single-geometry-type layers this will be a single-item list, for no-geometry layers it will be an empty list, and likely the same for truly unknown. Multi-geometry-type layers would return a more-than-one-item list as well as "unknown" for the existing getters. Implement this at least for the memory data provider and 2-3 others (like OAPIF, GDAL (if doable with GDALs interface, haven't looked into that), and postgres).
    • Also make a note (likely as a code comment) to revisit this interface in QGIS 4 to see if it can be better combined with the existing getters)
    • At this point nothing will have changed for the user
  2. Allow the user to specify a feature type filter in the data source URI, use it in the feature iterators
    • Still no user-visible change, though it will be possible to add filtered layers using Python
    • Having set a filter will result in the new getter from step 1 to only return that geometry type, getting access to the full unfiltered list would need another metadata getter (possibly combined with getting information about potential geometry types in multiple geometry fields, which as noted is related, but out-of-scope)
  3. Use this functionality for selecting feature types in the UI (like can be done for postgres and GeoPackage layers today); those data providers implementing similar functionality today will need to support both until QGIS 4. Implement it consistently for all data providers based on the now common API.
    • At this point the most crucial user-facing feature is in place; it is possible to work with mixed geometry layers by adding them as a single geometry layer
  4. Implement rendering and symbology editing for mixed-geometry layers
  5. Adjust code from 4. to also allow adding a mixed geometry layer
  6. Adapt processing algorithms
  7. Editing of mixed geometry layers

Some of these might result in multiple PRs.

I'll have to talk to my client to see how this would affect their ability to provide funding (steps 1-3 should be fine, as noted I think a "proper" implementation of the GUI for step 5 might be to expensive), might be that I end up doing some steps in my spare time (as I'm also interested in this).

Does this sound better?

Let each atomic change be discussed individually.

If you try for a mega "all-in-one" approach like you propose here

I think there might have been some confusion here based on my understanding of how high-level QEPs are intended to be; I assumed a broader scope which would then at implementation time be further split into separate units of work and PRs (see my initial planned PRs, now the list above, I never intended to try to do this all in one go). Do I understand you correctly here that I should open separate QEPs for each of the steps above?

And for the future; does it still make sense to open one broader QEP and then do smaller, more focused, QEPs after some initial discussion, or should I go straight for smaller focused ones (with some background etc. repeated between them)?