qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS
117 stars 37 forks source link

QEP 79: Document Management Systems (working title "dynamic relations") #79

Open m-kuhn opened 7 years ago

m-kuhn commented 7 years ago

QGIS Enhancement 79: Dynamic relations

Date 2016/10/28

Author Matthias Kuhn (@m-kuhn), Denis Rouzaud (@3nids)

Contact matthias@opengis.ch, denis@opengis.ch

maintainer @m-kuhn

Version QGIS 3.2

Summary

Documents like text, images, videos or any type of file can currently be linked with an external resource widget into QGIS projects. For domain specific modules, it is often required to attach multiple documents to an entity. QGIS offers 1:N relations which handles this kind of requirement.

At the moment, this requires one document table per feature table. This sometimes (almost) duplicates the number of tables inside a database. To keep the models more simple, this QEP proposes to offer the possibility to maintain all documents in a single document table (or more generally, any type of attached fields and not only documents).

The management of this will introduce a new kind of dynamic relation which is actually a group of standard relations. Although, retrieving the referenced (parent) layer from the referencing feature (document) is achieved by matching an expression against existing layers.

Proposed Solution

Dynamic relations

A dynamic relation consists of the same properties like an ordinary relation except for the referenced layer which is calculated based on one or several fields of the referencing layer. In its most simple form, the referencing layer will just insert the layer id of the referenced layer into this field.

To be more precise, a dynamic relation is a set of standard relations having the same referencing layer but having the referenced layer dynamically defined.

The dynamic setting of the layer is solved by using an expression which has to match some properties of the the referenced layer like the table name, schema, uri, layer id, ...

A possible configuration would look like this:

Layer metafeatures

The _referenced_layerexpression will be evaluated against layers being exposed as features. To do so, the QgsRelation gets a new method getLayerContextwhich will provide layer metadata as variables. Hence to retrieve the layer of a dynamic relation for a defined referencing feature (i.e. document), one will do

expression = dynamicRelation.referencedLayerExpression()
for layer_candidate in dynamicRelation.referencedLayers():
    ctx = dynamicRelation.getLayerContext(layer_candidate, referencing_feature)
    expression.setContext(ctx)
    value  = expression.evaluate(feature)

Each "layer feature" has attributes like

Embedding the document layer

A document layer will be available as relation editor (n-side) for all layers selected in the dynamic relation. When using the drag and drop designer, it is available the same way like ordinary relations are now already.

In an embedded document layer, only features where both conditions

When adding a new entry, both referencing field and referenced layer field(s) will be prefilled using default values set by the dynamic relation.

The document layer form

On the document layer, the relation reference widget does not receive any integration as the dynamic relation is a totally different beast, especially being aware that it might affect several fields of the referencing feature.

An optional part of this proposal is to add a new widget QgsDynamicRelationReferenceWidget being read-only and which will propose to either embed the referenced feature form or offer a button to open its form.

Performance Implications

None foreseen.

Backwards Compatibility

UX changes

API Changes

The QgsRelation API will gain the following:

public:
    /**
      * Returns the dynamic relation id which allow to recognize
      * and group existing standard relations as a dynamic relation
      */
    QString dynamicRelationId() const;

    /**
      * Returns the expression which allows to retrieve the parent layer
      */
    QString referencedLayerExpression() const;

    /**
      * Returns the list of fields used by the referencedLayerExpression
      * to retrieve parent layer
      */
    QgsAttributeList layerReferencingFields() const;

    /**
     * Provide layer metadata as variables in an expression context
     */
    QgsExpressionContext getLayerContext() const; 

and QgsRelationManager:

public:
    /**
      * Returns all the dynamic relations
      */
    QMap<QString, QList<QgsRelation>> dynamicRelations() const;

    /**
      * Returns the list of relations associated to a dynamic relation
      */
    QList<QgsRelation> relationsInDynamicRelation( QString dynamicRelationId ) const;

Aggregate functions

Aggregate functions might not fully support dynamic layers for now. Support might be added where trivial.

User Interface

The project properties will of offer an assistant to create and edit dynamic relations.

Dynamic relations will be grouped as such in the project properties: hence a single entry regroups all standard relations created for this dynamic relation.

The relation editor widget (listing documents for a feature) will be available as for standard relations for all referenced layers of the dynamic relations.

Sample Data

Table: qgep.od_documents

ID owner_table fk_obj_id file_name
1 qgep.od_damage DM_37 img/some_picture.jpg
2 qgep.od_wastewater_structure WS_25 img/another_image.jpg
3 qgep.od_wastewater_structure WS_25 img/yet_one_more_image.png
4 qgep.od_wastewater_structure WS_98 img/and_maybe_a_document.pdf

Table: qgep.od_wastewater_structure

obj_id description
WS_25 manhole
WS_98 special structure

Table: qgep.od_damage

obj_id description
DM_36 Pipe broken
DM_37 Old and rusty

Votes

(required)

haubourg commented 7 years ago

Hi I have no opinion on the technical implementation. I miss some mockups from a user perspective to understand clearly what you propose.

One other point is the integration with external Electronic Document Management system. Some clients will have on (Alfresco for instance) or scaling up QGIS application could lead to have hybrid systems. How could we have a solution compatible with EDM or QGIS internal document system?

wonder-sk commented 7 years ago

Maybe I am missing something, but if the requirement is to be able to use one table with related documents among multiple layers, why not just add an optional filter expression to definition of relations, so relations between one document table and multiple layers would have different filter expressions?

m-kuhn commented 7 years ago

@haubourg good question, the user interface will be fully flexible and integrated with what's currently there. The only thing that will change is the configuration interface. I've added a screenshot for that. The integration of external EDM can very well be addressed in a separate QEP (to not mix topics too much).

@wonder-sk yes, that would work for read-only files. For writing new entries to the documents table I think it's required to have more information. When embedding the document table, the expression required to generate the content to write to the _referenced_layerfield can be passed in the context.

haubourg commented 7 years ago

I have been looking a bit into standards protocols of existing DMS to have clearer ideas. It seems that most of them can expose documents as a file system either a CIFS or a WebDav web repository one, mounted on the client OS. I think this is enough to start a mini - DMS system in QGIS. If one client needs scaling up or interfacing with a DMS, this is a good and simple solution. So no need to explore deeper into advanced DMS features right now (URI, versionning, direct API's). We can see in another QEP if required.

haubourg commented 7 years ago

@m-kuhn BTW thanks for the mockups, it is clearer now. relation / references vocabulary starts to be hard to understand, dynamic relation add some complexity for new users. I would add a long tooltip to explain directly the possible use cases. Something like this for instance: " dynamic relation allows to generate relations from a centralized table contening table / fields to creates association between some ressources and all project rows. Ex: association document, images, url to many features across different layers"

m-kuhn commented 7 years ago

I would add a long tooltip to explain directly the possible use cases

Yes, documenting this in place is a good idea. Not sure where we stand w.r.t "What's this? button" vs. "Help... button" vs. "tooltip" and linking documentation. Personally I like tooltips for non-intrusive additional information. cc @rduivenvoorde

3nids commented 7 years ago

Wouldn't it make sense to define a set of layers which can be candidates?

You propbably see where I am going....can we use this mechanism for inherited layers by showing addequate sub form?

m-kuhn commented 7 years ago

It shouldn't be complicated to add another expression to filter supported layers. And even make the "Referenced Layer Field" an expression if required. I didn't want to overcomplicate things yet but if there are use-cases, that's all possible.

I am wondering a lot what people think about the "Layer metafeatures" idea. It's somehow similar to postgres system catalogs. @NathanW2 @nyalldawson @mhugo

mhugo commented 7 years ago

@m-kuhn Hi. The "layer metafeatures" idea sounds interesting and would add a bit of introspection, which may have nice use cases.

However, I am not sure about the "dynamic" relation. Just to be sure I understand: if I look at your three example tables, why would you need the "owner_table" column in oddocuments ? if we assume IDs are globally unique (as it is here since they are prefixed with WS or DM_), it can be solved with two 1:N relations isn't it ? Adding this kind of "pointers" in relations makes me think we are getting far from database normal forms.

Another option to consider to reduce the number of tables : what about storing document references in an array ?

m-kuhn commented 7 years ago

If we assume IDs are globally unique (as it is here since they are prefixed with WS or DM), it can be solved with two 1:N relations isn't it ?

It's not 2 but 30 different 1:N relations that would need to be added (and even then I think it's not normal form, for this you'd need 30 additional doc_xx tables, no?).

I'm confronted right now with two cases where it is like that and I cannot change the underlying schema for interoperability with other software that relies on this DB design. I guess there are more out there.

m-kuhn commented 7 years ago

what about storing document references in an array ?

That violates the first normal form I think?

mhugo commented 7 years ago

It's not 2 but 30 different 1:N relations that would need to be added (and even then I think it's not normal form, for this you'd need 30 additional doc_xx tables, no?).

Ok, you have to add N additional tables or, more or less equivalently N additional columns that are foreign keys to the other tables (probably with a "only one of them is not null" constraint). Once the project is correctly defined (with all the tables and relations declared), is it still a problem to have lots of table for the user of the forms ?

That violates the first normal form I think?

It does. But arrays are supported by most RDBMS and I think some work has been done recently for array support in QGIS.

It is actually not a matter of normal forms after all, but rather a matter of being able to reuse existing concepts of RDBMs. And be able to enforce type constraints in the DB if the data come from a DB. I cannot see how I can express the "foreign key" constraint of fk_obj_id in od_documents with, say PostgreSQL. Then I'm stuck with a relation that has a "weak" type, which I don't really like :)

3nids commented 7 years ago

On PostgreSQL side, I think the way to do this is to use inheritance. You need to create one inherited table per relation and it would only add a constraint for the foreign key (the column is already on the parent table). An example here

I don't know if tis is reasonale to port that mechanism in QGIS (in terms of code and in terms of consistency with other RDBMs), but everything could be read from the DB. Insert occurs on child table, reading can be made on parent table but relations are discovered by testing children tables.

m-kuhn commented 7 years ago

I cannot see how I can express the "foreign key" constraint of fk_obj_id in od_documents with, say PostgreSQL. Then I'm stuck with a relation that has a "weak" type, which I don't really like :)

Neither do I like them but... A really proper solution would probably be inheritance like as @3nids says. But this is a) not properly implemented by many RDBMs and b) even less used in production. So we are left with different approaches of which you listed several (arrays, multiple fks) each which their pros and cons. The point is, I am requested to propose a solution for the approach which is described in the QEP because it actually IS in use. Apart from this, it's opt-in, so nobody forces your database design to take another approach ;)

mhugo commented 7 years ago

I am requested to propose a solution for the approach which is described in the QEP because it actually IS in use

Well, I can easily show you bad designs that are in production :) Is it enough to support them in QGIS ?

It seems to me having intermediary tables / relations to represent a N:M relation for each "object" you have would be enough. And if this leads to "too much" tables, then why is it a problem ?

Anyway, as you said, nobody would force me to use this feature, so ... +0 for me.

ponceta commented 7 years ago

Drag and drop support? Since we are talking of working with documents?

m-kuhn commented 7 years ago

What exactly is your requirement?

This is mostly about the backend (how to integrate modeling approaches with qgis). On the frontend / forms side, there's plenty of space for improvement. I'll be happy to discuss that but I think it's better to separate the two topics clearly.

ponceta commented 7 years ago

I've actually many forms which are registering urls from files. This is not a great way of doing things since when the file is modified or deleted, the link remains unchanged. (But taking this appart), a user who wants to link 100 photos to 100 spatial objects has to go over 100 forms (or go through the attribute table). My front end requirement would be for user to have easily a drag and drop zone in the qgis form, which could enable a quick link to the file. I've already discussed that with @mhugo during one of the qgis plugin formations in Lausanne. This could be a setting in the editing tool / external ressource / Saving mode :

m-kuhn commented 7 years ago

Yes, that would be a very interesting functionality. Would you mind writing a separate QEP - or a requirements document - for this?

m-kuhn commented 7 years ago

Are there any remaining comments or blockers on this issue?

If noone objects, I would like to ask for voting.

haubourg commented 7 years ago

Hi all, could we launch the vote session here?

pcav commented 7 years ago

Looks good to me. Thanks.

rouault commented 7 years ago

To be honest, I had a hard time to understand that "referenced layer" was an expression evaluated on all layers of the project until the expression matches the value of "reference layer field", but this is an advanced feature, so...

+0 from me (supportive, but no strong technical opinion)

wonder-sk commented 7 years ago

For me in the current state it would be -0.

It feels to me that dynamic relations are quite a complicated concept to understand... and maybe difficult for users to find out what is wrong if things do not work correctly.

I think I find it a bit scary that all matching layers automagically get into relationship :-) I would prefer the relationship to be always explicitly defined between pairs of layers, and if necessary we could just have an assistant to generate many relations following a particular pattern.

May I suggest a modification where the relations would be still defined explicitly, but they would allow user to specify identification of the particular related table? To demonstrate on your example data, the relation definition for "od_damage" table could look like this:

So I only added "parent id field" and "parent id value" to the definition of an ordinary relation, which identify which rows of the "Documents" table belong to "od_damage" layer, so when selecting rows from "Documents", we know which ones to use, and when adding new rows to "Documents" we know how to fill in the particular field that identifies the relation. The relation is still just between two layers as before - no automatic matching of layers. Of course this means the relation would need to added for each pair of layers instead of having one dynamic relationship between multiple layers, but that's in my opinion desirable. It would also avoid the need for a metatable with layers (aka catalog) which I was not that convinced about (and the fact we would be extracting bits of URI for some layers to allow matching of schema/table in dynamic relations).

With the modification above it would be a clear +1 for me :-)

rduivenvoorde commented 7 years ago

I just want to add: can we keep stuff as simple and performing as possible please. To me QGIS is a general purpose GIS application which should be able to perform on large datasets. My idea is that more and more features pop up which do not perform (anymore) on big datasets, for example a huge Postgis db.

I just want to express my feeling that we should keep concentrating on our true GIS-nature, that is: do the core GIS-things performing and on all GIS-formats. To me this QEP looks more like a application-feature then a general purpose feature, though I'm not 100% sure about that.

But maybe it is just because I currently fight with huge postgres tables a lot, and have to wait for QGIS to show me the attributes or counts of these too long :-)

m-kuhn commented 7 years ago

@wonder-sk

It feels to me that dynamic relations are quite a complicated concept to understand... and maybe difficult for users to find out what is wrong if things do not work correctly.

This probably depends a lot on the user interface. In the configuration dialog, dynamic relations could be shown collapsible and all resulting relations could be shown below with their values evaluated already. I guess that would be quite intuitive and transparent.

I think I find it a bit scary that all matching layers automagically get into relationship :-) I would prefer the relationship to be always explicitly defined between pairs of layers, and if necessary we could just have an assistant to generate many relations following a particular pattern.

What could be done is this:

When new tables are added, the relation configuration dialog will need to be visited to enable a relation. But in turn the information from the relations API would be static.

It would also avoid the need for a metatable with layers (aka catalog) which I was not that convinced about (and the fact we would be extracting bits of URI for some layers to allow matching of schema/table in dynamic relations).

We don't need an extra metatable - we already have it - it was called QgsMapLayerRegistry and is now integrated in QgsProject :) The only change would be to expose its data to the expression engine. I am not especially keen to get this done for dynamic relations. I think this will offer new ways to use expressions in many other scenarios. I currently find it really annoying that to identify layers in an expression I have to either use the layer id (which is not robust for removing and re-adding layers) or the layer name (which some users intentionally or accidentally modify), using db information would help here. And I'm sure we will find other uses for it as well.

In short: I am in favor of exposing-the-map-layer-registry-to-the-expression-engine feature, regardless of the way for integrating a DMS.

@rduivenvoorde I agree with "it should be as simple and performing as possible". But given that this feature is opt-in it will not affect simplicity or performance for existing projects (or new projects which don't use this feature) in any way.

But maybe it is just because I currently fight with huge postgres tables a lot, and have to wait for QGIS to show me the attributes or counts of these too long :-)

While unrelated to this topic, I really hope we will solve this for QGIS 3.0 :+1: .

wonder-sk commented 7 years ago

What could be done is this:

  • Ordinary relations are extended the way you propose (static values)
  • For the "assistant", we could use a system like in the proposal (based on expressions)
  • These expressions could be saved in the project (for easy access if new tables are added later on)
  • Candidate relations would be listed in the assistant and receive a checkbox to enable/disable a given relation

When new tables are added, the relation configuration dialog will need to be visited to enable a relation. But in turn the information from the relations API would be static.

Sounds good to me!

We don't need an extra metatable - we already have it - it was called QgsMapLayerRegistry and is now integrated in QgsProject :) The only change would be to expose its data to the expression engine. I am not especially keen to get this done for dynamic relations. I think this will offer new ways to use expressions in many other scenarios.

Sure, but we would need support for subqueries in expressions to make it really useful, right? Because normally expressions are evaluated in context of layer's features, so access to rows of system catalogs would need to be in subqueries?

m-kuhn commented 7 years ago

Sounds good to me!

Cool, thanks for the heads up.

We don't need an extra metatable - we already have it - it was called QgsMapLayerRegistry and is now integrated in QgsProject :) The only change would be to expose its data to the expression engine. I am not especially keen to get this done for dynamic relations. I think this will offer new ways to use expressions in many other scenarios.

Sure, but we would need support for subqueries in expressions to make it really useful, right? Because normally expressions are evaluated in context of layer's features, so access to rows of system catalogs would need to be in subqueries?

There already is partial support for this via

get_feature( @layers, 'table_name', 'some_table' )

or it could also be used via aggregates.

This doesn't mean that there is no room for improvement as in

get_feature( @layers, filter:= "table_name" = 'provinces' AND "schema" = @my_table_schema )

or others...

m-kuhn commented 7 years ago

Side-tracking this discussion again :-)

I just found a post on gis.se asking for conditional visibility for raster layers in atlas generation. Having layer-level expression filtering, this would be much more flexible than with the current approach based on |-separated layer names.