openPMD / openPMD-api

:floppy_disk: C++ & Python API for Scientific I/O
https://openpmd-api.readthedocs.io
GNU Lesser General Public License v3.0
142 stars 51 forks source link

Design limitations in the frontend resulting in unexpected behaviours (segfaults, premature destructor calls) #534

Open franzpoeschel opened 5 years ago

franzpoeschel commented 5 years ago

After following around a segmentation fault when using the openPMD API, I came to the conclusion that there is a relatively deeply-nested issue with the design of the frontend's classes. Most of those classes have been designed as handles to some underlying data so that users can take a lightweight copy (instead of a reference or pointer) and still interact with the same shared data. Key to this is the class Container that wraps a shared_ptr to an internal container (e.g. std::map). Most classes exposed in the public API store their payload behind a Container (e.g. in Series or Iteration) or directly behind a shared_ptr (e.g. in Attributable).

The issue arises because atm there is no sufficiently clear separation between (1) functionality of copyable handles and (2) functionality that should be restricted to the unique non-copy internal data. So far, I have found two issues falling in this category:

(1) Destructor calls This issue has limited extent, fortunately, since most classes use a default destructor. But also the root class of the openPMD tree Series has been designed as a handle, so its destructor should not really perform a flush.

(2) Going upward in the openPMD hierarchy More problematic is the fact that the openPMD tree is not a tree of unique objects but a tree of the mentioned handles. The tree is linked (in upward direction) using the classes Writable and Attributable, using raw pointers in order to break reference cycles. (More idiomatic would be weak pointers, but this is not the issue here). This linkage is later used again to go upward in the openPMD hierarchy. When doing this, in principle the handle that has been stored as a parent can already be invalidated while the underlying data is still present.

Fortunately enough, it is currently difficult to trigger this behavior. The following rather constructed examples show the issue:

    {
        std::unique_ptr< openPMD::Series > series_ptr;
        {
            openPMD::Series series( "sample%T.json", openPMD::AccessType::CREATE );
            series_ptr = std::unique_ptr< openPMD::Series >(
                new openPMD::Series( series ) );
            // i am the marked line
        }
        series_ptr->iterations[0].meshes["E"]["x"].makeEmpty< int >( 1 );
    }

Destructor runs at the marked line. No iteration is present yet, so flushing in file-based mode throws an exception. It seems surprising that the series is flushed.

    {
        std::unique_ptr< openPMD::Series > series_ptr;
        {
            openPMD::Series series( "sample%T.json", openPMD::AccessType::CREATE );
            series_ptr = std::unique_ptr< openPMD::Series >(
                new openPMD::Series( series ) );
            series_ptr->iterations[0].meshes["E"]["x"].makeEmpty< int >( 1 );
        }
        // i am the marked line
    }

We define the data earlier, so the first flush passes. Since the original handle has been destroyed, the second flush at the marked line fails with a segmentation fault.

The issue is not currently critical since triggering it requires writing rather esoteric programs, but for my current implementation of streaming this is becoming increasingly difficult to handle since re-parsing the openPMD hierarchy is necessary and I need to pay attention to not link to any copied handles but the original one instead.

I can currently work around the problem, but these issues should be considered in an upcoming refactoring of openPMD's frontend.

ax3l commented 5 years ago

Thanks for the assessment and I agree with the design issues.

To add on your analysis, I think the use of (user-facing) handles is fine and the underlying issues reside in the business logic (internally linked handles/parents that you mention) at the following points.

First of all, the logic of openPMD logic (defaults, etc.) is not separated from data "container" objects. This is problematic as we have already seen and makes upgrades way more complicated than they should be. We should refactor all defaults, including sub-object creation out of the data storage. (If you want, e.g. MVC without the V ;) is a pattern to follow, or related approaches.)

Furthermore, the logic between handles from parents to childs is a directed tree, which is very cumbersome when dealing with construction, destruction and defaults. This can be solved with refactoring the first part, I think.

ax3l commented 5 years ago

cc-ing @C0nsultant for ideas/feedback :)

franzpoeschel commented 5 years ago

Furthermore, the logic between handles from parents to childs is a directed tree, which is very cumbersome when dealing with construction, destruction and defaults

On the one hand, since openPMD is a hierarchical format, we should retain this logic from the user side, but I think we have seen that building our memory layout off of this is not the best way to go. There is no clear “ownership” of data since some operations will require walking up and some walking down the tree.

I would suggest splitting the Series object into a unique (per Series) internal object and a handle class to that object that is exposed to the user. The internal object can then become the “sentinel” object of the Series, something that we currently do not really have. Instead of handling memory in the current ad-hoc way, this object can then be used for (1) administering the objects of the openPMD hierarchy and also for (2) non-trivial destructors, default handling etc.

Within the hierarchy itself, we should then use std::weak_ptrs (and maybe even retain std::shared_ptrs in downward direction), but drop usage of raw pointers.

ax3l commented 5 years ago

On the one hand, since openPMD is a hierarchical format ...

Yes exactly, also I am not sure our (physical) hierarchy will always be a simple DAG tree in openPMD. We might relax that in newer versions of the standard.

There is no clear “ownership” of data since some operations will require walking up and some walking down the tree.

Fully agreed, the one-direction linking makes it even harder to keep track. Not speeking of the map< shared_ptr< string > things for handles o.0

I would suggest splitting the Series object into a unique (per Series) internal object and a handle class to that object that is exposed to the user.

I think that's a good idea and would allow us furthermore to expose this handle object in two public headers: one with MPI includes pulled in and one without. Potentially create it via two free standing functions?

ax3l commented 4 years ago

flush() is painfully implemented, we have to urgently change its meaning. The only thing flush() shall do is:

For flush() is shall not be relevant:

Currently Series::flush() tries to achieve the latter on top of its actual task - and it's a mess.

Consequently, in MPI-context the Series::flush() call must be non-collective (can be blocking, as long as a progress guarantee is ensured). The newly added ::finalize() members can (must) be collective and blocking.

ax3l commented 3 years ago

Addressed for Series via #886 #936 #955 (user-facing).

Further classes (Iteration, Record, RecordComponent, etc.) are to do.

eschnett commented 3 years ago

This issue (or a closely related one) is causing problems for me trying to wrap openPMD from Julia.

The problem I face is that many of the public openPMD types do not expose a default constructor. This makes it impossible (or at least very difficult, and I don't know how to do this for a Julia interface) to put objects of such types into STL containers. In practice, the STL really expects types to have a default constructor, and to have an "invalid" state if necessary.

Another issue I'm facing is that Julia is garbage collected, and it's thus difficult to ensure that objects are destructed and closed in any particular order. I can call close or flush on some objects, but that doesn't guarantee that other, related objects have been deallocated. Explicitly tracking an "invalid" state (instead of deallocating objects) would greatly help here.

One simple way to make all this happen is to leave the openPMD types as they are, and to essentially use std::shared_ptr<X> as user facing handles. I wouldn't mind if this type was explicitly exposed in the API – people know what shared_ptr is (copying is legal, copying is relatively cheap, no need to explicitly call a destructor, ownership is handled automatically, etc.) Of course, creating new wrapper types (as you suggest above) is also possible, assuming that the ownership/copying/cost are documented so that people know to treat these wrapper objects.

I would then also store these wrapper objects (and not actual objects) into the various std::map objects there are. This way, direct references are completely avoided. (There are also nice tricks via std::enable_shared_from_this that one can play.)

It would be important (at least for Julia wrappers) that the implementation objects are completely hidden, and do not e.g. appear via public inheritance. For example, there is currently a Series class which publicly inherits from SeriesImpl. In this case, both these objects should be wrapper objects. (This might already be true.)

franzpoeschel commented 3 years ago

Before answering to your post specifically, be aware that we already redesigned the Series class to address the issues described above and are planning to extend the redesign to our other classes.

This issue (or a closely related one) is causing problems for me trying to wrap openPMD from Julia.

The problem I face is that many of the public openPMD types do not expose a default constructor. This makes it impossible (or at least very difficult, and I don't know how to do this for a Julia interface) to put objects of such types into STL containers. In practice, the STL really expects types to have a default constructor, and to have an "invalid" state if necessary.

We have already moved Series to a design that allows this. Since that change has not produced any issues so far (I think?), we can move on and apply this change to our other classes as well.

Another issue I'm facing is that Julia is garbage collected, and it's thus difficult to ensure that objects are destructed and closed in any particular order. I can call close or flush on some objects, but that doesn't guarantee that other, related objects have been deallocated. Explicitly tracking an "invalid" state (instead of deallocating objects) would greatly help here.

Again, Series now has operator bool() for this. On the other hand, Python is garbage-collected, too, and we do not expose that information there at all. I don't really understand destroying things in a particular order, doesn't garbage collection mean that the order is explicitly not specified?

One simple way to make all this happen is to leave the openPMD types as they are, and to essentially use std::shared_ptr<X> as user facing handles. I wouldn't mind if this type was explicitly exposed in the API – people know what shared_ptr is (copying is legal, copying is relatively cheap, no need to explicitly call a destructor, ownership is handled automatically, etc.) Of course, creating new wrapper types (as you suggest above) is also possible, assuming that the ownership/copying/cost are documented so that people know to treat these wrapper objects.

I would then also store these wrapper objects (and not actual objects) into the various std::map objects there are. This way, direct references are completely avoided.

This is exactly the plan, yes (There are also nice tricks via std::enable_shared_from_this that one can play.)

It would be important (at least for Julia wrappers) that the implementation objects are completely hidden, and do not e.g. appear via public inheritance. For example, there is currently a Series class which publicly inherits from SeriesImpl. In this case, both these objects should be wrapper objects. (This might already be true.)

SeriesImpl is the class that defines and implements the interface. It must stay public, otherwise Series would have no member functions. Apart from this, Series basically just wraps a shared pointer to a shared resource. So, Series is the wrapper, SeriesImpl is the interface.