wonder-sk commented 7 years ago

QGIS Enhancement 76: Project and Layer Registry Refactoring

Date 2016/10/24

Author Martin Dobias (@wonder-sk)

Contact wonder dot sk at gmail dot com

maintainer @wonder-sk

Version QGIS 3.0

Summary

This proposal aims to clean up the code responsible for handling of projects and layers associated with projects. Here is a brief list of the benefits that the refactoring should introduce:

move QgsMapLayerRegistry functionality into QgsProject
removal of QgsServerProjectParser (in future)
allow opening multiple projects at the same time
easier to use API (not using layer IDs)
easier to make unit tests
fix issues with embedded projects and conflicting layer IDs
allow lazy loading of layers (?)
Merge QgsProject and QgsMapLayerRegistry

These two classes are dependent on each other: QgsProject writes all layers from layer registry when a project file is written, and it inserts all layers to layer registry when a project file is read (after clearing the registry first). It makes sense therefore to merge these two into one class to make things simpler. Also, one singleton less to deal with.

Support for Multiple Projects

It is currently (v2.x) not possible to have multiple project opened in QGIS environment because QgsProject is a singleton. This leads to various problems: QGIS Server implements its own QgsServerProjectParser for loading of projects, which naturally leads to incompatibilities between desktop app and the server. Another case is the use of embedded projects, where parsing of project files is again done independently from QgsProject. Having a chance to have multiple projects loaded at the same time would enable further functionality in the future, such as opening a project in browser dock and loading some layers from it to the current project.

The idea is to still have a static instance of QgsProject, but also to enable further instances to be created in order to load or save other projects.

References to Layers (and Other Objects)

One of the greatest issues with having multiple projects loaded in QGIS was the fact that layers from different projects may have used the same layer IDs internally, something that the map layer registry was not designed for.

We solve this issue by introducing a concept of references to arbitrary QObject-based objects. References are simple 128-bit universally unique identifiers (UUID) generated by a pseudo-random number generator. A mapping between UUIDs and actual objects is done using a global hashtable, to allow resolution of QObject instances from UUIDs.

References therefore can substitute layer IDs and they are more generic at the same time, as they can be used for lookup of any QObject-based object within QGIS.

References can be thought of as weak pointers to QObject-based classes. Why not to use std::weak_ptr or QWeakPointer instead? We would need to start using shared pointers for QObjects and that does not feel right from various reasons: QObject classes already have their parent-child memory management, all APIs use raw pointers to QObject and Python integration would get only more difficult (with Python's own memory management).

An important feature of references is that we can find out when a QObject is deleted (because a signal gets emitted), so it is possible to invalidate references to deleted objects. We may also store associated reference UUID in QObject's object name (or in a dynamic property).

The usage is simple: whenever a piece of code needs a reference, it calls QgsReference::ref(object) which will return QgsReference object that wraps assigned UUID. Later, when the object is needed, a call to QgsReference::resolve(ref) will return pointer to the object or null if it does not exist.

There is no need to register objects somehow before they can be used as references - they are automatically registered first time the reference is requested. We do not want explicit registration like with QgsMapLayerRegistry::addMapLayer() anymore because it sucks: it always caused confusion as to when layers need to be registered (e.g. required for rendering or for usage in layer trees, but not required to fetch features or to modify layer data).

It expected that code that previously used layer IDs and QgsMapLayerRegistry to resolve IDs to map layers will switch to storing QgsReference objects and use QgsReference::resolve(ref) whenever needed. It would be good idea to update public APIs to work with actual objects even if they only store references to those. For example, to keep track of map layers in QgsMapSettings class, we should have setLayers(QList<QgsMapLayer*>) and QList<QgsMapLayer*> layers() methods and internally store the list of layers as QList<QgsReference>.

To be decided: how to handle thread safety. We could make handling of references thread-safe, but there is a problem that using QObject that belongs to main thread in a worker thread will never be safe. So maybe better to explicitly forbid adding/resolving references in worker threads. Or have a separate hashtable of references for each thread.

Here is a sketch of the QgsReference class:

class QgsReference
{
public:

  //! returns ref for the object if it has been registered already - otherwise it will first
  //! generate UUID for the object, set it to object's name, add to internal hash and start
  //! listening to destroyed() signal to be able to detect when the object is deleted
  static QgsReference ref( QObject* obj );

  //! resolve object from a reference - resolves to a valid object or null if does not exist
  //! or was deleted already
  static QObject* resolve( const QgsReference& ref );

  //! constructs invalid reference
  QgsReference();

private:
  //! constructs a valid reference
  QgsReference( const QUuid& uuid );

  QUuid mId;

  static QHash<QUuid, QObject*> mRefs;  // + mutex for static methods
};

For QgsReference::resolve we could have a special variant to return objects casted to QgsMapLayer (or a general template variant).

Loading and Saving

OK so we have references to objects, but does that fit into loading and saving of projects? References cannot be stored in project files or layer style files because they are only valid during runtime of QGIS. We need a means to read files where some components reference objects from other components and also a means to write files with references. This will be done with a two-phase reading and writing. When writing data, our references with UUIDs will get a string identifier that will be valid within the scope of the file. Afterwards when reading data, the string identifiers will be kept and once everything is read, they will be resolved into true objects. This mechanism can be used not only for reading and writing of project files, but also any other documents where references may be used.

When writing a document:

step 1: for each component, register any objects that may be referenced from other components into a context object. The context object basically keeps a mapping from QObject to QString identifiers.
step 2: for each component, write the data (e.g. project XML file) and whenever a reference needs to be written, use the context to get the string identifier (it must have been registered in the step 1)

When reading a document:

step 1: for each component, read the data (e.g. project XML file) and whenever a field that references other object is encountered, register its string identifier and get a reference. This reference object will always point to a temporary placeholder QObject, so any operations involving such object should be deferred. Components that "own" objects that can be referenced register those to the context object. The context here has a mapping of created objects (from QString to QObject) and a mapping of placeholders (from QString to QgsReference).
step 2: now that all components have been read, we take all the placeholder objects and replace them by real ones. Finally all components are notified that the references have been resolved and they can finish initialization if it involves manipulation of referenced objects.

This is how an interface could look like to support loading/saving documents with references. Please note that it is not specific to usage with project files:

class QgsSerializable
{
  // read support

  class ReadContext
  {
    //! stores real objects to be used at a later time to resolve placeholders to real objects
    void addReferenceForId( QString id, QgsReference ref );
    //! references returned by this method are refs to placeholders that will be resolved once the reading has finished
    QgsReference referenceForId( QString id );
  };

  //! called to read
  virtual bool read(xml, context) = 0;
  //! called after all serializables have been read
  virtual void referencesResolved(context) {}

  // write support

  class WriteContext
  {
    // registers a reference that can be used later for writing
    void addReferenceForId( QString id, QgsReference ref );
    //! returns real objects previously registered in addReferenceForId()
    QgsReference referenceForId( QString id );
  };

  //! called before write() of any serializable
  virtual void registerReferences(context);
  //! called to write
  virtual bool write(xml, context) = 0;
}

Improve Project API

QgsProject class is one of the oldest in the QGIS API that has survived long without any significant changes. There are some bits of API that should be updated to make it more modern, for example to make API for reading/writing of properties more like QSettings and some other tweaks to make it easier to use.

One of the changes will be in improving reading and writing of projects. Currently, other components read or write data from/to project file be connecting to its readProject() and writeProject() signals. This would be replaced by a more object oriented approach:

a new interface class QgsProjectComponent will be introduced with read/write methods (TODO: subclass of QgsSerializable ?)
components would implement QgsProjectComponent and would be registered to a particular QgsProject instance
read/write methods of project components will be called to allow reading/writing of references
Ownership of Layers

It is suggested that map layers are made children of projects they belong to (using QObject parent-child relationship). The QObject parent-child relationship makes sure that one layer belongs at maximum to one project and potentially allows access from map layer to its project. This can be extended to embedded projects, where QgsProject instances would have another QgsProject as their parent.

Backwards Compatibility

The proposal breaks API compatibility by changing QgsProject API and by changing how referencing of layers works. This is however not a problem as the proposal is aiming for QGIS 3.0.

Issue Tracking ID(s)

https://github.com/qgis/qgis3.0_api/issues/8 https://github.com/qgis/qgis3.0_api/issues/16 https://github.com/qgis/qgis3.0_api/issues/23 https://github.com/qgis/qgis3.0_api/issues/57

Votes

(required)

vmora commented 7 years ago

calls QgsReference::ref(object) which will return QgsReference object that wraps assigned UUID. Later, when the object is needed, a call to QgsReference::resolve(ref)

Is there an issue with operators "&" and "->" overloads in QgsReference base class ? That would make the same usage transparent ?

wonder-sk commented 7 years ago

Is there an issue with operators "&" and "->" overloads in QgsReference base class ? That would make the same usage transparent ?

We need Python API where these overloads are not possible. By the way, QgsReference is not meant as a base class from which referencable objects would be derived from (referencable objects need to be derived from QObject). So while "->", "*" and "bool" operators could still work with QgsReference class, I do not see how "&" operator could be used as it would need to work on QObject pointers.

elpaso commented 7 years ago

This looks like a great enhancement! Just a few questions:

this new classes will be in the core and independent from gui?
when you say "It is suggested that map layers are made children of projects they belong to" does that mean that the same layer cannot belong to more than one project? (not sure if this is a limitation though)
I hope this refactoring will make it easier to modify a project on the fly, use case: when the QGIS server loads a project, it should be possible from a plugin to add/remove/edit a layer before it enters the main server loop. Can you give some more details on the new QgsProject API?

wonder-sk commented 7 years ago

@elpaso

this new classes will be in the core and independent from gui?

Yes, all in core. Of course some project components may be implemented in gui or app code, but that's natural as they would work with objects only available in gui/app code.

when you say "It is suggested that map layers are made children of projects they belong to" does that mean that the same layer cannot belong to more than one project?

Right. I do not have a strong opinion about that (and it can be removed if there are valid use cases that would require it), but it seemed like a sensible idea. That does not however mean that a layer could not be used in other project - just the fact that there would be a clear owner of each layer.

I hope this refactoring will make it easier to modify a project on the fly. [...] Can you give some more details on the new QgsProject API?

The API would stay mostly the same, QgsProject would just get functionality currently available in QgsMapLayerRegistry (add / remove / access map layers). Adding or removing layers therefore would mean calling QgsProject::addLayer(layer) or QgsProject::removeLayer(layer)

m-kuhn commented 7 years ago

Good stuff! Thanks for tackling this.

Some things that come to my mind

+1 for more pointers and less ids in the API
+1 for not trying to solve threading problems with this
Regarding QgsProjectComponent, which advantages to you expect compared to signals?
Will it be possible to get the owner project from a layer? This would be great because I think 80% of the calls to QgsProject::instance() could be simply replaced with layer->ownerProject() in this case without passing a project reference down the call stack and adding QgsProject* parameters to virtually every method in the API.
Why using different identifiers in the file and at runtime? I assume this is done to avoid clashes when having several projects (or subprojects) open with the same layer. But since the described approach clearly assigns ownership to a project (and no longer to a global singleton) the scope of an id within a particular project should be no longer a problem. A safety check to avoid clashes could be added in the QgsProject::addLayer() method. Subproject layers could be addressed with "subProjectId:layerId". On the other hand, having different IDs is cumbersome comparing project files and layer ids resolved at runtime and in case the ids changes with every save (not sure from this QEP) this is also cumbersome for project files managed in a git repo.

wonder-sk commented 7 years ago

@m-kuhn

Regarding QgsProjectComponent, which advantages to you expect compared to signals?

I expect more order in the code :-) With two-phase read/write, we would need two more signals, so it would just get more messy. Also easier to trace what components are registered with a project. The signals here do not seem like a good API design as QgsProject is asking other classes to do something rather than reporting change of its state.

Will it be possible to get the owner project from a layer?

I have tried to address that in "ownership of layers" section: layers would have project as their parent QObject if they are assigned to a project. Convenience call like layer.ownerProject() that would cast parent to QgsProject could be added too.

Why using different identifiers in the file and at runtime?

I have contemplated about this decision a lot (my first design also assumed use of "projectId:layerId" IDs like you suggested). Comparing those two approaches, the solution with UUID seemed better in the end:

no need to register layers anywhere: with layer IDs, one would always need to register layers with projects in order to be able to resolve them later. So if you load two layers and want to make a join between those, at least the joined layer would need to belong to a project to make it work. This is quite annoying as it requires the developer to remember when does a layer need to be associated with a project (just like now with the map layer registry). Devs would need to create temporary projects just to put layers there for some simple operations.
no need for a global registry of projects: with layer IDs, we would need to keep track of existing projects and assign them IDs, so that other classes like QgsMapSettings can resolve the layers
references are a more generic concept: they can be used beyond the scope of layers to reference any other objects we may need to reference in future without adding further registries etc.
references force us to actually fix APIs where layer IDs are used
layer IDs would still require us to do two-stage reading (the second stage is currently just being hidden in QgsProject - e.g. resolution of joins after reading)
when adding a bunch of layers (that may have dependencies on each other) from some other project, there still may be clashes between layer IDs and we would need a mechanism for re-mapping layer IDs
in case of missing references when loading/saving projects, handling of these errors can be centralized through the read/write contexts (with layer IDs one would need to do the validity checking explicitly for every reference)
resolving UUIDs into objects is more efficient

nyalldawson commented 7 years ago

What about the other stuff currently available in the map layer registry - specifically:

the ones which return lists of layers
the ones which match layers by name (not id)
the layers added/about to be added/will be removed signals?

Will these be moved to QgsProject?

How would something like QgsMapLayerComboBox be updated to work with this new approach - will it just show the layers attached to the current project?

wonder-sk commented 7 years ago

@nyalldawson

What about the other stuff currently available in the map layer registry [...] Will these be moved to QgsProject?

Yes, they will be moved to QgsProject.

How would something like QgsMapLayerComboBox be updated to work with this new approach - will it just show the layers attached to the current project?

I guess it would show layers of the current project for now - later we may add API call to choose a different project for which layers will be listed. I think modifying all the classes to be aware of multiple projects is outside of the scope of this work: the idea is to make multiple projects possible, and then let the rest of QGIS start using that functionality where appropriate - assuming QGIS server will be probably the first happy user of multiple projects functionality.

nyalldawson commented 7 years ago

I guess it would show layers of the current project for now - later we may add API call to choose a different project for which layers will be listed. I think modifying all the classes to be aware of multiple projects is outside of the scope of this work: the idea is to make multiple projects possible, and then let the rest of QGIS start using that functionality where appropriate - assuming QGIS server will be probably the first happy user of multiple projects functionality.

Totally agree!

Sounds great to me, +1 on this whole approach.

m-kuhn commented 7 years ago

no need to register layers anywhere: with layer IDs, one would always need to register layers with projects in order to be able to resolve them later.

Isn't the id mainly there for serialization? Joins should IMHO use pointers to layers instead of ids. In many cases it seems cleaner that where ids are required layers have to belong to the same project (and creating a dummy project isn't that bad).

no need for a global registry of projects.

I agree: no global project registry. But I don't think this is required in any case. QgsMapSettings shouldn't be using layer ids (layer ids are no solution to concurrency problems ;) ). I know it currently does but currently broken things shouldn't affect the design decisions.

references force us to actually fix APIs where layer IDs are used layer IDs would still require us to do two-stage reading (the second stage is currently just being hidden in QgsProject - e.g. resolution of joins after reading)

I don't think having the runtime-id match the project-file id would require that, does it?

when adding a bunch of layers (that may have dependencies on each other) from some other project, there still may be clashes between layer IDs and we would need a mechanism for re-mapping layer IDs

In this case there's something bad going on anyway and dropping references to the second layer with the clashing id is mostly good. (E.g. you don't want to append the same join a second time). In other cases we can still deal with remapping that if we see need.

in case of missing references when loading/saving projects, handling of these errors can be centralized through the read/write contexts (with layer IDs one would need to do the validity checking explicitly for every reference) resolving UUIDs into objects is more efficient

No objection to read/write context and centralized error handling. And with less usage (only project load/save time) the resolving overhead is minimized.

wonder-sk commented 7 years ago

@m-kuhn

Isn't the id mainly there for serialization?

Not only serialization, layer IDs are also widely used as weak pointers. There are many cases where some objects store layer IDs instead of pointers for a reason: if they would store directly pointers, a lot of classes would need to start checking for layer removal signal from map layer registry to make sure they do not end up with dangling pointers... which would be a pain. If those classes just end up referring to an invalid layer ID, it is not a big problem.

QgsMapSettings shouldn't be using layer ids

Indeed it should not, but it also should not store pointers to QgsMapLayer instances due to possible dangling pointers...

I don't think having the runtime-id match the project-file id would require that, does it?

Yeah: even if IDs would be the same in project file and during runtime, two-stage load may be necessary. You can have cases where obj1 refers to obj2, and obj1 does some initialization based on obj2, but obj2 is read only after obj1 is read. The initialization still needs to be deferred until obj2 is read.

m-kuhn commented 7 years ago

So after all it wouldn't be a big problem to have project ids and runtime ids match? :)

wonder-sk commented 7 years ago

So after all it wouldn't be a big problem to have project ids and runtime ids match? :)

Ummm... I thought my earlier comments explained why it would be a problem to have matching project and runtime ids :-)

Why is it a big problem: clashes between layer IDs among different projects. Why not compound IDs "projectID:layerID": a different set of problems would arise, see https://github.com/qgis/QGIS-Enhancement-Proposals/issues/76#issuecomment-255909600)

m-kuhn commented 7 years ago

I thought I had addressed all those. :-)

Why not compound IDs "projectID:layerID": a different set of problems would arise, see #76 (comment))

s/projectId/pointer to project/

I think it's cleaner to do project->layerById( id ) than SomeSingleton::instance()->layerByGlobalId( synteticId ). At the moment we have a layer id and a layer pointer as reference to the same thing.

In many cases it will be possible to do:

layer->ownerProject()->layerById( joinedLayerId )

In some other cases it may be required to alter the API to let a component know the project it works on.

myComponent->setProject( project )

or just use

iface->activeProject()->layerById( layerId )

until the API is cleaned up at some point in the future.

I think this approach will actually force us to clean up the mess instead of replacing one registry with another registry just to make it a bit less messy ;)

wonder-sk commented 7 years ago

But this still has the problems that 1. it is impossible to refer to a layer that does not belong to a project, 2. it is difficult to refer to a layer from a different project, 3. other bits of code will need to be associated with projects.

For example, if I have a QgsMapSettings object, I would need to associate it with a project to be able to resolve layers it is referencing to, right? How do I serialize pointer to a project, so that after writeXml() and readXml() I end up with the same object contents?

m-kuhn commented 7 years ago

I don't think 1. and 2. are actually issues.

Number 3. will be required more and more over time when we allow to work on more than a single project.

For example, if I have a QgsMapSettings object, I would need to associate it with a project to be able to resolve layers it is referencing to, right?

It shouldn't be saving layer ids anyway so I don't like this particular example. But for now... either save it inside or alter the api to QList<QgsMapLayer*> QgsMapSettings::layers( QgsProject* project ).

How do I serialize pointer to a project, so that after writeXml() and readXml() I end up with the same object contents?

Don't they take a ReadContext/WriteContext which can contain a pointer to the QgsProject they work on?

wonder-sk commented 7 years ago

I don't think 1. and 2. are actually issues.

Well, my opinion is that 1. is really annoying and a constant source of issues of plugin developers, and 2. is imposing an artificial limitation. So yes they are issues in my eyes.

Number 3. will be required more and more over time when we allow to work on more than a single project.

That is true, but there plenty of objects that have nothing to do with projects and we would tie them to project just to be able to be able to resolve references to layers.

Also, if we would want to reference other kinds of objects than layers, they would also need to be tied with projects? That would be too much reliance on project and does not seem like a right thing to do.

Don't they take a ReadContext/WriteContext which can contain a pointer to the QgsProject they work on?

No - serializable objects and read/write contexts are independent from projects, they can be used in other cases - e.g. to load/save layer style (QML file).

m-kuhn commented 7 years ago

Well, my opinion is that 1. is really annoying and a constant source of issues of plugin developers, and 2. is imposing an artificial limitation. So yes they are issues in my eyes.

Regarding that this only triggers when using ids as weak pointers (which usage you want to reduce if I understand correctly) that should not be too invasive. I think it's the same for 3.

No - serializable objects and read/write contexts are independent from projects, they can be used in other cases - e.g. to load/save layer style (QML file).

Hmmm... I see... But I think in this context layer ids also have a very limited validity.

After all, I think the main issue I have is with using a global registry for managing weak pointers (apart from that I'm still not sure if it's possible to resolve project layer ids from layers at runtime which is useful in certain situations). But I see, that this is an issue which needs to be addressed as well. So...

... another idea to approach this:

Use QPointer<QgsMapLayer*> to store references (I assume pyqt does the same, unverified)

And to solve concurrency issues:

When using them, guard by a mutex
Guard deletion of layers in the project by the same mutex

Do you think that could work?

NathanW2 commented 7 years ago

+1 for UUID. Would have made my life making QLRs easier.

On Tue, Oct 25, 2016 at 11:14 AM, Martin Dobias notifications@github.com wrote:

@m-kuhn https://github.com/m-kuhn

Regarding QgsProjectComponent, which advantages to you expect compared to signals?

I expect more order in the code :-) With two-phase read/write, we would need two more signals, so it would just get more messy. Also easier to trace what components are registered with a project. The signals here do not seem like a good API design as QgsProject is asking other classes to do something rather than reporting change of its state.

Will it be possible to get the owner project from a layer?

I have tried to address that in "ownership of layers" section: layers would have project as their parent QObject if they are assigned to a project. Convenience call like layer.ownerProject() that would cast parent to QgsProject could be added too.

Why using different identifiers in the file and at runtime?

I have contemplated about this decision a lot (my first design also assumed use of "projectId:layerId" IDs like you suggested). Comparing those two approaches, the solution with UUID seemed better in the end:

no need to register layers anywhere: with layer IDs, one would always need to register layers with projects in order to be able to resolve them later. So if you load two layers and want to make a join between those, at least the joined layer would need to belong to a project to make it work. This is quite annoying as it requires the developer to remember when does a layer need to be associated with a project (just like now with the map layer registry). Devs would need to create temporary projects just to put layers there for some simple operations.

no need for a global registry of projects: with layer IDs, we would need to keep track of existing projects and assign them IDs, so that other classes like QgsMapSettings can resolve the layers

references are a more generic concept: they can be used beyond the scope of layers to reference any other objects we may need to reference in future without adding further registries etc.

references force us to actually fix APIs where layer IDs are used

layer IDs would still require us to do two-stage reading (the second stage is currently just being hidden in QgsProject - e.g. resolution of joins after reading)

when adding a bunch of layers (that may have dependencies on each other) from some other project, there still may be clashes between layer IDs and we would need a mechanism for re-mapping layer IDs

in case of missing references when loading/saving projects, handling of these errors can be centralized through the read/write contexts (with layer IDs one would need to do the validity checking explicitly for every reference)

resolving UUIDs into objects is more efficient

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qgis/QGIS-Enhancement-Proposals/issues/76#issuecomment-255909600, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXS3LkDjVqPj1FhIjONvX1hc2XOKiu2ks5q3Vf2gaJpZM4KeagI .

wonder-sk commented 7 years ago

Use QPointer<QgsMapLayer*> to store references

This could actually work quite well for us (I was looking at it before but not sure why I did not use it). There is an issue that QPointer is not supported by PyQt (verified), but we can sort it out by wrapping it for our needs: class QgsReference : public QPointer<QObject> { ... }; So the design would stay the same, just QgsReference would internally use more efficient QPointer implementation rather than our home made UUIDs.

I was quite curious how QPointer actually works. In Qt4, QPointer contains QObject* o and there is a global hashtable (QMultiHash<QObject *, QObject **>) with mutex that keeps track of all QPointers. When a QObject is deleted, it nullifies all QPointers pointing to it. (My proposal was to wait for destroyed() signals which would be suboptimal). In Qt5, they improved the implementation and QPointer now contains QWeakPointer<T> wp, so it reuses shared pointer support within in QObject, and the global hashtable is not used anymore. For more details see: http://www.macieira.org/blog/2012/07/continue-using-qpointer/

And to solve concurrency issues:

When using them, guard by a mutex

Guard deletion of layers in the project by the same mutex

Not sure I follow. What kind of concurrency issues are we talking about?

wonder-sk commented 7 years ago

To further clarify my thoughts now:

internally C++ classes can use QPointer<QgsMapLayer>
public APIs only use QgsMapLayer*
QgsReference is there for PyQt support
read/write contexts in C++ can use template methods, e.g. QPointer<T> referenceForId<T>(QString id) while in Python we would have QgsReference referenceForId(QString id)

One thing that gets a bit more complicated with QPointers is resolution of references after loading: the original design assumed that QgsReference would have machinery to replace placeholder QObjects by real QObjects. With QPointers we will have to find some alternative solution - maybe the components will have to deliberately update their weak pointers in the second stage rather having that happen by some magic.

m-kuhn commented 7 years ago

Thanks for the interesting read for QPointer, I haven't seen that one before.

To further clarify my thoughts now:

internally C++ classes can use QPointer<QgsMapLayer>

public APIs only use QgsMapLayer*

QgsReference is there for PyQt support

read/write contexts in C++ can use template methods, e.g. QPointer<T> referenceForId<T>(QString id) while in Python we would have QgsReference referenceForId(QString id)

That sounds quite good. I think I have seen exceptions in PyQt like "underlying C++ object has been deleted" which is a hint that it's smart enough to gracefully detect deletion of QObjects (by using a QPointer as well internally?) already now. So not sure if QgsReference is required for PyQt. But something like QgsReference could be clever for read/write.

Not sure I follow. What kind of concurrency issues are we talking about?

What I was thinking about here was e.g. multithreaded rendering. Layers can be used from a separate thread while there's a risk they get deleted on the main thread which is a potential source of problems at the moment. If guarding worker-thread-access and deletion (QgsProject::removeLayer()? layer->deleteLater()? ) with a mutex plus making some functions (mostly getFeatures()) thread-safe, that could prevent the worst.

wonder-sk commented 7 years ago

I think I have seen exceptions in PyQt like "underlying C++ object has been deleted" which is a hint that it's smart enough to gracefully detect deletion of QObjects (by using a QPointer as well internally?) already now.

That's a different mechanism (it is not limited to QObject): if a C++ class has a virtual destructor, the sip wrapper (a derived class) will get its destructor called, so it can mark that the wrapped object has been deleted.

What I was thinking about here was e.g. multithreaded rendering. Layers can be used from a separate thread while there's a risk they get deleted on the main thread which is a potential source of problems at the moment.

Most of the work on multi-threaded rendering actually was to remove dependencies on layer objects (e.g. using "feature sources" instead of vector layers to get features, cloning raster pipelines), so they can be modified or deleted without affecting the rendering. I know there are few bits left (e.g. non-cached vector joins) and some new ones have been added (e.g. expression functions "getLayer" or aggregates), but the general idea is to stay away from layers...

If guarding worker-thread-access and deletion (QgsProject::removeLayer()? layer->deleteLater()? ) with a mutex plus making some functions (mostly getFeatures()) thread-safe, that could prevent the worst.

I am not sure if that is possible or robust enough. One can still also just do "delete layer;", and deleteLater() is not a virtual method to allow overriding, and layers should not be forced to be assigned to projects. In my opinion the code that still uses map layer registry in worker threads should be updated to not do that (e.g. determine what extra layers will be needed and prepare "feature sources" for those).

m-kuhn commented 7 years ago

That's a different mechanism (it is not limited to QObject): if a C++ class has a virtual destructor, the sip wrapper (a derived class) will get its destructor called, so it can mark that the wrapped object has been deleted.

Regardless of the technical details, I think it voids the purpose of QgsReference for python because these pointers are weak already.

but the general idea is to stay away from layers...

+1 in general...

but I think there are some things which cannot be resolved this way. One trigger for this is dynamic nature of expressions. e.g. get_feature accepts a layer name as parameter which can be evaluated at runtime. Same for the eval function. So

either all data that might potentially get used during rendering (but 99% of the time is not) gets copied every time
or we lookup the information when it's required but need to make sure it correctly handles deleted resources and concurrent access. This means thread-safe functions and some rules for how to delete these resources (e.g. register them with a project [or another scoped resource management registry] which takes care of not deleting a resource while it's in use, unregistered resources are expected to be managed by the owner for quick and dirty python scripts. Or a global registry [-1 for that already]. )

Or such "dangerous" language constructs are forbidden in certain scenarios (which would be a pity).

m-kuhn commented 7 years ago

For reference: some experiments with QPointer<QgsMapLayer> in https://github.com/qgis/QGIS/pull/3683#issuecomment-256775474

wonder-sk commented 7 years ago

Regardless of the technical details, I think it voids the purpose of QgsReference for python because these pointers are weak already.

Maybe, it needs some thoughts into how the design of reading of serializable objects will change with the switch to QPointer.

but I think there are some things which cannot be resolved this way. One trigger for this is dynamic nature of expressions. e.g. get_feature accepts a layer name as parameter which can be evaluated at runtime. Same for the eval function.

There are still ways how to keep those expressions working while keeping rendering separate from map layers. The dependencies may be either declared (there is already a concept of layer dependencies), or they can discovered and the rendering may get restarted. In any case, IMHO usage of either get_feature() or eval() should be highly discouraged as they are inefficient for anything but very simple cases.

m-kuhn commented 7 years ago

We can highly discourage people from using it, accepting that it might be slow. But as long as it's available it should not crash the application.

So I've been thinking a bit more and one approach that also might work is using QObject::invokeMethod with Qt::BlockingQueuedConnection and passing in a QPointer<QgsMapLayer>.

E.g.

// This method will be executed on the main thread and the calling thread (e.g. rendering) is blocked until it finishes.
QgsFeatureIterator QgsVectorLayerThreadingTools::getFeaturesFromLayer( QPointer<QgsMapLayer> layer, const QgsFeatureRequest& request )
{
  QgsMapLayer* ml = layer.value< QPointer<QgsMapLayer> >().data()
  QgsVectorLayer* vl = qobject_cast<QgsVectorLayer*>( ml );
  if ( vl ) // we are on the main thread with a valid layer
    return vl->getFeatures( request );
  else // no valid layer: possibly deleted
    return QgsFeatureIterator(); // and log... or have a return value like QgsOptional to signal a problem...
}

// This code needs to be executed on a separate thread
// It should be placed into some helper method that will make sure that the method is called
// directly and not with a connection if it's called on the main thread.
QgsVectorLayerThreadingTools vltt;
QgsFeatureIterator iterator;
QMetaObject::invokeMethod( &vltt, "getFeaturesFromLayer", Qt::BlockingQueuedConnection,
                      Q_RETURN_ARG( QgsFeatureIterator, iterator ),
                      Q_ARG( QPointer<QgsMapLayer>, someLayer ),
                      Q_ARG( QgsFeatureReuqest, someRequest )
);

The method (getFeaturesFromLayer() or whatever) will be run on the main thread and rendering will wait for the result. If the QPointer gets invalidated at any point we are able to deal with problems in a defined and secure way.

This could also be interesting for the task manager since there are also potential references to layers that might have been removed by the time a scheduled task is finally started. https://github.com/qgis/QGIS/pull/3004

wonder-sk commented 7 years ago

That's an interesting approach... however I am not sure if that would not introduce other problems: if the rendering is cancelled, main thread waits for the rendering to stop. During that wait the event loop in main thread is not running, so invoking methods using blocking queued connections may end up with deadlocks...

m-kuhn commented 7 years ago

Yes, that's something to take care of. I have created a patch yesterday that puts cancelling of rendering into a separate thread. That was more related to responsiveness of the UI but will help in this case as well.

qgis / QGIS-Enhancement-Proposals

QEP 76: Project and Layer Registry Refactoring #76

QGIS Enhancement 76: Project and Layer Registry Refactoring

Summary

Merge QgsProject and QgsMapLayerRegistry

Support for Multiple Projects

References to Layers (and Other Objects)

Loading and Saving

Improve Project API

Ownership of Layers

Backwards Compatibility

Issue Tracking ID(s)

Votes