sofa-framework / sofa

Real-time multi-physics simulation with an emphasis on medical simulation.
https://www.sofa-framework.org
GNU Lesser General Public License v2.1
927 stars 312 forks source link

initData must go! #44

Open maxime-tournier opened 8 years ago

maxime-tournier commented 8 years ago

Dear all,

I used a deliberately provocative title to start a friendly technical discussion about the way Data are initialized.

As of now, Data constructors potentially incur a non-negligible overhead per component created, and it is unclear why users should have to pay for (at least some of) it. The overhead can get pretty large as many components are created (think of contacts, for instance), so I hope this discussion will clarify what is really needed and what can be optimized away.

The current way to initialize member Data in a component is to do something like:

MyComponent() 
   : my_data( initData(&my_data, init_value, "friendly name", "help message") ) { 
 // ...
}

At this point one can remark that most information provided to initData depends on the component class, and not on the component instance being constructed, save for the initial value and the pointer to the current instance (used by the data as its 'owner').

Now what happens when the data is constructed using initData?

  1. some BaseInitData objet is created, mostly holding pointers to character strings and owner,
  2. BaseData is constructed from BaseInitData (BaseData.cpp:70) where the following happens:
    1. data links for inputs and outputs are setup, 2x std::vector::push_back
    2. data are added back to their owner (from which they come anyway) 1x std::vector::push_back + 1x std::map::insert for aliases

There is also the DDGNode base class initialization, which also incurs at least initLink for members inputs and outputs (cf DDGNode.cpp:43), each again causing addLink back to the owner and an extra std::vector::push_back. Phheew.

Now you probably see where I am heading at: do we really need to pay a worst case 6x heap allocation for every single data in every single component created? Or at least, can this cost be alleviated somehow?

Are there component examples that manage Data which are not known at compile-time ?

If most/all the use-cases are in fact per-class, it is relatively easy to imagine a system that maps names to instance members constructed statically, so that component instances don't have to hold and allocate a vector of pointers to their own datas (!).

Data links are more tricky since they depend on the graph, but my opinion is that there should be at least a way to disable automatic link management when they are not needed.

Now thanks for reading this far, I look forward to hearing the community's opinion on the subject :-)

damienmarchal commented 8 years ago

I Maxime,

Thanks to rise up this kind of debate.

I agree with you that the cost for data init may be large.

I see two problems:

Do you have a code suggestion for refactoring so that we can have an idea on:

I do add Data at run time from time to time but it is not crucial for me and I could be handle by something called a DynamicData :)

DM.

maxime-tournier commented 8 years ago

Hello Damien, and thanks for your feedback.

Let met address your points one by one:

Invasiveness

That depends on the use cases for Data embedded in a component. From the code in Base.h, it seems that components essentially should be able to:

Assuming it's all there is to Data management in a component, then the following changes could enable opt-in, class-based data management when needed:

  1. Make all the associated methods virtual: Base::findData, Base::addData, Base::removeData, Base::getDataFields, possibly changing return type in the case of Base::getDataFields from const VecData& to VecData. In the latter case it would probably be even better to change the iteration strategy altogether, for instance by using:

    virtual void Base::getDataFields(VecData& out) const = 0;

    ...to leave allocation to the caller. Grepping getDataFields on the whole SOFA codebase yields around 10 hits (python bindings, generateDoc, GUI and modeler, solver merger) so it should be manageable.

  2. Provide a protected BaseData constructor that does not initialize links/owner datas. Obviously we do not want to silently break existing code, so this behavior should be opt-in, for instance using a tag class:

    class BaseData {
    public:
     struct no_init {};
    
    protected:
     BaseData(no_init) : // ...
    };

    This one should be easy.

  3. Add a public Data constructor that only initializes value, something like:

    template<class T>
    class Data {
    public: 
     Data(BaseData::no_init, const T& value = T() ) : // ...
    };

    Again, easy.

  4. Provide component constructors that do not initialize their Data so that optimized components can be derived:

    Base::Base(BaseData::no_init) : name(BaseData::no_init), //...
    BaseObject::BaseObject(BaseData::no_init) : f_listening(BaseData::no_init), //... 
    
    // ... BaseMapping, Mapping, etc.

    this one is tedious but not that hard.

Now if all of this can be done, a derived component class will be able to manage its data on a per-class basis, and only initialize data lazily when they are actually needed (i.e. when findData or getDataFields are called). Adding/removing data would be no-ops or errors in these derived classes.

Performance

Changing the iteration strategy incurs an extra copy of the existing DataVec when accessing data, but since we are likely to iterate the vector anyways there should be no visible change in complexity. Careful allocation on the caller side should also be able to avoid most allocations during vector copy. Finally, data iteration does not seems to happen during computationally intensive sections.

Should this remain an issue, it is always possible to iterate using a callback instead, which is acceptable using c++11 lambdas/std::function, but probably too cumbersome using c++98.

To summarize I think the performance hit for current code path is minimal.

On the other hand, a component using BaseData::no_init with per-class data managemenent potentially saves up to 6 heap allocations per data per constructor call, and a fraction of this during destruction.

FYI, I counted 10 Data members in base class Mapping<In, Out> alone, collision::Contact has 5, ForceField has 7,MechanicalObject has around 30. This means that for every collision between two different collision models, there are at the very least 50+ Data created, each of which can cause up to 6 heap allocations.

That is 300+ heap allocations potentially saved for a single collision, before even doing anything.

Now one can always argue that by grouping objects in mechanical objects one can minimize the number of contact classes, but firstly for complex scenes this is a very tedious/complex thing to do, and secondly there should be a way of not paying for features you don't use (this is c++ after all).

So I think it is safe to say that the potential benefits significantly outweigh the cost :-)

I hope the above addressed your questions, please let me know if this sounds reasonable.

damienmarchal commented 8 years ago

Hello Maxime,

Thanks for the precise analysis that is really helpful as well as on tracking the 'hidden overheads' in sofa.

All that sound reasonable to me and I share your view that the cost hit of creating object in case of collision is a strong point for doing something.

Maybe people interested in this topic could experiment in a dedicated branch so that we have an alternative implementation as well as a real test benchmark illustrating the speedup (eg: an hardcoded component creating a huge amount of collisions).

Regards, DM.

matthieu-nesme commented 8 years ago

Max, if I follow correctly, the way you propose would not break existing code? And we could have both regular Data and Class-handled Data at the same time? So we could update components little by little? It definitively worth trying it.

Something tricky with actual Data is the multithreading management, maybe someone at InSimo could tell us a word about that?

maxime-tournier commented 8 years ago

Matt',

Yes the breakage would be minimal.

It would be possible to mix instance-based and class-based Data management as long as the consistency in maintained through the data API in Base. Static data would incur no extra penalty at construction time, but later add/remove would remain possible.

I am unsure this is desirable, however: a full-blown mix of class-based and instance-based management would require more efforts, mainly to chain class-based datas up the class hierarchy, and I am not sure it is worth it. It could be the case though, but this requires significant changes and testing, like converting all the existing initData to the new system for example.

The use case I had in mind was more like the 5-10% cases where it really matters to be fast, and we don't really care whether data's are setup properly, and we sure do not want to pay for it.

If data inspection turns out to be required (e.g. for debug), then we can come up with an ad-hoc solution when/if it's needed, but at this point it is not clear whether this will be of sufficient interest.

But maybe more people could step in and provide feedback?

JeremieA commented 8 years ago

Hello,

Interesting discussion. I think having a separate mechanism for static versus dynamic data instances would be adding quite a lot of complexity. However the current system could be improved based on your observations :

  1. VecLink m_vecLink in BaseData will only ever contain { &inputs, &outputs }. VecLink could be changed to helper::fixed_array<BaseLink*,2> to remove those allocations (BaseData::addLink() should then be removed or a NOP like in DDGNode)
  2. Reducing dynamic allocations is why help/group/widget were stored as const char* instead of std::string (as in most case they are compile time constants). However this did create some bugs in components, so I would be in favor of always storing strings. But one new issue with c++11 is that std::string is no longer allowed to use copy-on-write to share a single buffer between instances, therefore this introduces new allocations with gcc 5 for instance (and name is currently copied multiple times). One option would be to use a different string implementation (such as fbstring as used by facebook and detailed in the CppCon 2016 talk “The strange details of std::string at Facebook"). An alternative that would help is to use the new move semantics to minimize the number of copies involved.

Regarding multithreading, it is indeed an issue. Currently constructing an object and all its Data is a local operation that can happen in parallel in any thread, but adding/removing them within a given scene graph is supposed to be done sequentially. There is the "aspect" mechanism that allow to provide a "frozen" version of the graph to another thread while it is being changed through adding and removing node and objects. But adding (and removing) Data in objects is not covered by this mechanism, and is not supported if multiple threads are accessing it concurrently. Which basically limit the creation of dynamic Data to the sequential initialization phase.

maxime-tournier commented 8 years ago

Thank you Jeremie for this valuable feedback.

I started working towards a faster initialization path for Data and Link in a separate branch, I will post some results here if I get anything meaningful. The main use-case is a scene with many contacts, thus many components/data are added and removed at each time step due to collision handling. We'll see whether this is worth the effort.

A small thing: I've seen that the DataTypeInfo for fixed array-like types (RigidCoord, Vec, and the like) all enable the CopyOnWrite flag, which makes any Data holding such type allocate on the heap on construction. Is this needed?

Thanks again for your help.

JeremieA commented 8 years ago

I don't remember a specific reason why fixed arrays were flagged as CopyOnWrite, it may be just because the type description was largely copied from resizable vectors. It would indeed make sense to disable this flag, or better to look at the memory size and only enable the flag if it is more than a reasonable threshold.

damienmarchal commented 7 years ago

I'm very interested in the issue...is there any progress here ?

damienmarchal commented 7 years ago

Update ?

maxime-tournier commented 7 years ago

Hi Damien,

I haven't had much time to devote to this as of late, but here is some quick feedback so far:

The takeaway: this is probably premature optimization at this point. I might give it another shot in the next few months as my work may depend on it but until then, don't expect major changes on this front.

(edit: typos)

damienmarchal commented 7 years ago

Hi Maxime,

Thanks for the feedback.

damienmarchal commented 7 years ago

Hi all,

In some of my work I tend to add new Data to Base object and as rightly pointed by @matthieu-nesme this may have a significant cost.

To have more insight I just made a quick test with the following result: The time to create 200 000 (empty) component (without calling their init() function):

My question now is should we stop adding Data to Base or do we find a solution to decrease the cost of all the initData ?

matthieu-nesme commented 7 years ago

Both ;)

damienmarchal commented 7 years ago

@matthieu-nesme I like so much your concision I should learn from that when writing ;)

maxime-tournier commented 2 years ago

Hi all,

Just a quick update asked by @hugtalbot on how we handled this issue at Anatoscope. It's been a while so our fork drifted quite a bit since the opening of this issue:

As you can see, things don't quite look the same, but I think old-timers would still recognize what's going on ;) By the same token I think this issue can be closed, but please don't hesitate if you have any question.