idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.62k stars 1.01k forks source link

Smart Pointer Usage in MOOSE (Discussion) #7062

Closed permcody closed 8 years ago

permcody commented 8 years ago

We need to chat about what kinds of smart pointers we should be using and how we should be using them in the framework. According to the new Scott Meyer's book, he recommends std::unique_ptr() as the go to pointer to replace raw pointers in a lot of contexts. It's very fast since there is simply a pointer to the destructor and there's no other complications to deal with. Shared pointers on the other hand actually require two words (128 bytes) to store and there's a "control block" for each object stored by a shared pointer. As those pointers are copied around there are multiple hits (at least one increment and decrement) to update the information in the control block which is protected with atomics which can severely impact performance especially in threaded regions.

I think the fear of using unique_ptrs is that as you copy them, the owner's pointer becomes invalid and that has left a nasty taste in some people's mouths. This is fine for some things and not fine for others but it goes hand in hand with using move semantics in C++11 of which we can start taking advantage of throughout. Scott points out that Factory patterns should always use unique pointers to construct objects. The Factory doesn't need a pointer to the object after it's constructed so it's really fast an efficient to build a "unique_ptr" object and pass it out of the factory. In contexts where those objects might need to be shared you can simply make them shared on demand. He points out that if you understand the lifetimes of your objects you can simply create raw pointers from your unique pointers for general access. We absolutely do know who owns all objects in MOOSE, and we know when they are destructed so again, it seems like a smart idea.

I realize that some of this is moot because hopefully we don't need to worry about the costs of copying around factory constructed objects. That happens once per simulation. However, we've already seen people passing smart pointers by reference for no reason other than C++98 best practices of passing non-POD objects by reference. This needs to go away with C++11. It's no longer the best or only way to get the best performance (as half of Scott's book talks about).

The only huge downside to using unique ptrs right now, is that "make_unique" is MISSING in C++11! This however can be rectified with a simple wrapper that we can add to MOOSE. I'd like to start this discussion with @idaholab/moose-developers and see what people think? Should we be using a few more unique_ptrs instead of turning to shared_ptrs throughout the framework?

roystgnr commented 8 years ago

A simple wrapper will do it? I thought the problem was that you can't create a simple make_unique without variadic templates, which are also missing in C++11.

If there's a solution I'm missing, let's get it in libMesh? https://github.com/libMesh/libmesh/issues/655

permcody commented 8 years ago

My understanding is that it can be done. It's not a trivial implementation, but it can be done: http://stackoverflow.com/questions/7038357/make-unique-and-perfect-forwarding

roystgnr commented 8 years ago

Ah - variadic templates did make it into C++11! I wonder why I remembered otherwise.

permcody commented 8 years ago

So we just need to add this to the libMesh namespace with the proper ifdefs and associated compiler test. Then we can start using it in MOOSE.

friedmud commented 8 years ago

Ok: here's my thing... I think we should ONLY use std::shared_ptr.

Here's my argument. The only reason why you're suggesting unqiue_ptr is for optimization. I claim that this is a premature optimization that causes a schism in the code.

With unique_ptr you always have to remember that it's a unique_ptr and treat it with kid gloves. You end up with TONS of lines of code where you're calling get() and getting back raw pointers.

You say that copying a unique_ptr invalidates the original... that's actually NOT true. You can't copy a unique_ptr... you can only std::move() it. That means that you also end up with tons of lines of code where you're explicitly calling std::move() in order to assign a unique_ptr.

Not only that... but with unique_ptr you now have two different ways of dealing with pointers. You have to remember that some things are unique_ptrs while other things are raw pointers. You still have many raw pointers running around the code... and you have to remember which ones of those came from unique_ptrs so you don't need to destroy them and which ones came from raw new's so you do need to destroy them.

Overall: unique_ptr is messy. It creates tons of crap through the code that is simply there to deal with the unique_ptr.

shared_ptr is clean. We can use it everywhere. You can use it exactly as you would a regular pointer... passing it around without a care in the world. With shared_ptr you never need to remember anything... you simply use it and it will clean itself up.

In this mode there are NO possibilities of memory leaks. All news will get eradicated and turned into make_shared and all raw pointers will go away.

I've actually worked this way in my extremely time-sensitive new application... and I can tell you that shared_ptrs are small and the reference counting is not a thing to worry about. I have timing data to show it.

Let me ask you this: if there was no memory penalty and no time penalty for using shared_ptr would you still be championing the use of unique_ptr combined with raw pointers?

permcody commented 8 years ago

I think you make some good points and I'm all for avoiding premature optimization but we really should consider each situation and apply the right pointer where it's needed instead of declaring that we should only use one everywhere. What about a class that simply allocates memory internally and never shares it? What about the factory case (build and move)?

Just because we are supporting C++11 doesn't mean that engineers will know what to do with these pointers if they start appearing in user facing code. I think at one point you were an advocate for hiding a lot of the smart pointer semantics and having users continue to use raw pointers in their user objects. Maybe that has changed but I really don't want to see a lot of people passing around references to smart pointers which we are already seeing throughout user contributed code. There is still some utility to getting a raw pointer when you just need to use an object and you don't really need to manage it in any way.

As far as the performance goes I'm not sure we have enough evidence that there aren't performance implications. You might be using shared pointers "smartly" in your code. If you aren't copying them around, there probably isn't much of a hit at all. However if you were to copy them in a performance sensitive area, I'd bet we'd see something on a benchmark. We could gen up a quick little code to see. We do know that floating point trapping can be orders of magnitude more expensive on some hardware than others. We've talked to the PETSc guys about this before. We might see the same thing with atomics on some architectures versus others. We'll just have to see.

I'm just trying to think ahead since we know that some developers are playing on creating 100K objects in the near future and I'm not sure if I want to incur the additional memory and control blocks for all of those objects if it's really not necessary.

I'm only advocating about using the right tools in the right locations. We may very well decide that we want to use shared pointers in many places, I just don't think we should completely shunt the use unique ptr.

dschwen commented 8 years ago

What about circular use of shared pointers? Wasn't that an issue in the XFEM code and an example of why we can not just use them everywhere without thinking?

friedmud commented 8 years ago

@dschwen The answer to that is to use weak_ptr: http://en.cppreference.com/w/cpp/memory/weak_ptr . But I can agree with you that that is a tricky case. I don't think that unique_ptr would help here either though... you still have to make a decision about who is going to hold the unique_ptr vs who holds a raw pointer.

I'm with you @permcody : right tool for the right job. A unique_ptr should be used when there is a shared resource that should only be accessed by one object at a time (i.e. only one thing should have a reference to it at all... i.e. it's unique). shared_ptr should be used all other times.

The only time I've seen the reference counting of shared_ptr show up is when I was copying one on the inside of a very tight loop. I don't know where that would happen in MOOSE. Where is it that we currently have a pointer that's being copied (or passed by copy) in the middle of a tight loop in MOOSE?

As for the memory... we're talking 128 bytes here. 100k objects is still only 13MB. (I also don't think we really need 100k MOOSE objects... that's a symptom of bad design on our part... that needs to be solved with a VectorKernel... but that's a discussion for another thread). We have much larger memory issues with 100k objects than the 13MB coming from reference counting...

You asked about what to do in a factory. I would agree with you that a unique_ptr does make some sense in a factory... you want to "hand-off" that memory allocation. However... I still think that our factories should return shared_ptrs. Why? Because the first thing we're going to do with that unique_ptr is convert it to a shared_ptr so it can be passed around... so why not just create a shared_ptr in the first place?

permcody commented 8 years ago

As for the memory... we're talking 128 bytes here. 100k objects is still only 13MB.

Not quite - Each shared_ptr takes up 128 bytes (verse 64) but there's also a control block for each object being shared so if each shared pointer is pointing to a unique object that's 100K control blocks too. That control block can vary in size but generally holds a reference count atomic, weak count atomic, and the deleter itself (since that's part of the instantiation). Worse yet is that the control block might be independently allocated away from everything else if make_shared isn't used. Yes all optimizations, but it's important to understand the possible implications. Once the shared pointer is setup and sitting statically, the cost of using it should be roughly the same as unique ptr or even a raw pointer.

... Because the first thing we're going to do with that unique_ptr is convert it to a shared_ptr so it can be passed around...

In truth we only do this with a few objects in MOOSE. The Mesh is probably the most obvious case of being passed around because it is created before it's final home exists. Most other objects are created and shoved into their final container in FEProblem or one of the two systems.

It's worth looking at what libMesh has been doing for years. They used AutoPtr and ReferenceCountedObject classes for years. Both abstractions have their place in a well-oiled library.

jwpeterson commented 8 years ago

Tagging @idaholab/moose-team

friedmud commented 8 years ago

@permcody Yes... I've always HATED the way libMesh does smart pointers... it's one of my biggest gripes with the library and one of the biggest reasons why I want shared_ptr instead of unique_ptr. One of the biggest issues is consistency... sometimes things are unique pointers and sometimes they're raw pointers. You have to remember which one is which all the time and calling get() everywhere is just obnoxious.

friedmud commented 8 years ago

For me: I don't see the purpose in unique_ptr. If you think that only one place is ever going to own a piece of memory and everyone else should just access it using raw pointers... then why not just use raw pointers everywhere? All you're doing by adding unique_ptr is removing the need to say delete in that one object that's going to hold the "master" copy. That's just being lazy.

shared_ptr changes the paradigm by not needing to think about "who" owns anything...

jwpeterson commented 8 years ago

I've always HATED the way libMesh does smart pointers

You have to remember that neither shared_ptr nor unique_ptr was even a blip on the radar at the time we wrote those interfaces... all we had was the terrible std::auto_ptr. In our opinion then, even that was better than handing raw pointers back to users and hoping they would clean up after themselves and/or read the docs to even know when they had to do cleanup.

I still think we made the right design decision for things being allocated and returned to users, but don't think that public members like System::solution and System::current_local_solution should have been AutoPtrs at the time, especially considering how easy it made them for users to steal. At the time we did it because it made managing their lifetimes (on the library side) trivial, in hindsight deleting them in constructors would have been better.

jwpeterson commented 8 years ago

All you're doing by adding unique_ptr is removing the need to say delete in that one object that's going to hold the "master" copy. That's just being lazy.

Actually, that's preventing memory leaks and/or double frees (when users incorrectly think they should delete a pointer you've handed them) before they happen. This is a worthwhile reason to use them in my opinion.

roystgnr commented 8 years ago

Agreed with @jwpeterson, on both the returning-uniqueish-ptr APIs and on the exposed public member variable API failures.

Is it hard to create a shared_ptr from a factory returning unique_ptr? shared_ptr has a move constructor from unique_ptr, doesn't it? You should literally be able to write shared_ptr<Foo> fp = FooFactory::build(); even though the latter returns unique_ptr. And if it returns unique_ptr then the object lifetime is self-documenting; if shared_ptr then you can't be sure what's going on under the hood. It doesn't free you from the need to worry about who owns things, just from the ability.

friedmud commented 8 years ago

@jwpeterson I understand about those not being available back then. You guys did your best... but that doesn't mean it's not frustrating sometimes!

Actually, that's preventing memory leaks and/or double frees (when users incorrectly think they should delete a pointer you've handed them) before they happen. This is a worthwhile reason to use them in my opinion.

No... unique_ptr + raw pointers does not help with the "double free" issue.... and is actually part of my point against using unique_ptr. Anytime an interface hands you back a raw pointer you have to wonder whether or not you need to delete it... and many people will make the mistake of doing just that even though the pointer is being held by a unique_ptr somewhere.

This is exactly why I'm arguing against this idea of storing the objects with unique_ptr and then passing raw pointers around to them. It's simply confusing.

Also: when untrained users come along and try to get a unique_ptr from us and get a compile error because you can't copy it... what's the first thing they're going to do? They're going to Google "how to copy unique_ptr" and it's going to tell them to use std::move()... which is going to lead to them stealing the pointer and creating a huge mess.

jwpeterson commented 8 years ago

Anytime an interface hands you back a raw pointer you have to wonder whether or not you need to delete it... and many people will make the mistake of doing just that even though the pointer is being held by a unique_ptr somewhere.

I would never advocate handing back a raw pointer to something held as a unique_ptr by the library! I'm sorry if that's how my point came across.

From the library perspective, handing back a unique_ptr is appropriate if that will be the library's last interaction with it. If, on the other hand, the library is to maintain ownership, and control when destruction happens (as is more typically the case in MOOSE), then handing back a shared_ptr is more appropriate.

jwpeterson commented 8 years ago

shared_ptr has a move constructor from unique_ptr, doesn't it?

Indeed it does, see constructor 13.

friedmud commented 8 years ago

Ah - then @jwpeterson and I are on the same page.

Correct me if I'm wrong... but here are the two proposals:

  1. Use std::shared_ptr for most things. If it is clear that memory is being "handed off" and the first object is done with that memory then using a unique_ptr might be the thing to do (like in a Factory).
  2. Use std::unique_ptr for one reference to an object (like in a Warehouse) and dole out raw pointers pulled off of those throughout the rest of the framework.

I think myself and @jwpeterson are in camp 1 while @permcody is advocating 2.

permcody commented 8 years ago

Well not quite. I think there's a difference between "doling out" a pointer and just having one available as a member of a class somewhere which is the typical case in MOOSE. Let's look at Kernels for instance. We create them inside of NonlinearSystem.C and then immediately store them in the warehouse. We don't actually ever move them after that or even refer to them in user code at all. They are just stored in the warehouse and lists of them are updated as we move through space and time. We've recently converted all of these lists to complex data structures (vectors of maps of vectors of shared pointers or similar structures). The "compute" loop gets references to parts of those data structure where we loop over and use those objects. Users never see that part of the code and there's no change of ownership at any time. Where things get expensive is that we rebuild these list as we move through the mesh and possibly as we move through time. These are updates, so we are copying shared pointers around within the data structures. Even a single copy requires two updates to the control block and we are doing this with some frequency.

This is precisely the kind of thing that could easily be optimized if unique_ptrs where used instead. We already have an "all_objects" location which is where the ownership is handled. Everything else could just use raw pointers, it makes no difference. In fact when we first started with shared pointers that's exactly what we did. We had a single copy of the shared pointer and we just saved of raw pointers in the structures.

The current design is fine, but it also wouldn't impact anyone to change it either.

When constructing objects to hand them back to users, it might make sense to hand back a smart pointer either transferring ownership or sharing depending on the context. I'm not advocating handing back raw pointers when we have the smart pointer ourselves. That probably IS a bad idea.

friedmud commented 8 years ago

We are using shared pointers for this right now... and we've never seen anything related to shared pointers showing up in any timing run.

Using unique_ptr might "optimize" things... but it doesn't matter if it's in the noise.

How would you structure the Warehouse in your case? You would have one "master" vector that stored unique_ptrs (which is a pain in and of itself BTW... you cannot just push_back() a unique_ptr you have to std::move() it into a vector) and then all of the other lists/maps would just be of raw pointers? That's the kind of "schism" I was talking about where there is inconsistency in the framework with unique_ptr... because it forces a mix of smart and raw pointers.

Didn't we just move all of the Warehouses to using MooseSharedPointer? Now you effectively want to undo that? This is exactly what I'm talking about when I say that unique_ptr is no better than just storing raw pointers and having a delete. The old factory pattern had one vector of "all" objects and then ran through them and deleted them in the destructor. The only thing unique_ptr is giving you is not having to do that delete.

friedmud commented 8 years ago

Also: what would functions like this return?

https://github.com/idaholab/moose/blob/devel/framework/include/base/MooseObjectWarehouseBase.h#L284

Would it return a vector of raw pointers? If so... that's exactly what I'm talking about when I say that raw pointers are going to be "doled out" based off of unique_ptr instances.

Someone just using the API has no idea what those raw pointers came from... and doesn't know if they need to be deleted or not.

permcody commented 8 years ago

You would never delete raw pointers. That's never been a requirement of the framework. When it was, people screwed it up.

permcody commented 8 years ago

How would you structure the Warehouse in your case? You would have one "master" vector that stored > unique_ptrs (which is a pain in and of itself BTW... you cannot just push_back() a unique_ptr you have > to std::move() it into a vector) and then all of the other lists/maps would just be of raw pointers? That's > the kind of "schism" I was talking about where there is inconsistency in the framework with unique_ptr... > because it forces a mix of smart and raw pointers.

It was this way for months. This is how we introduced shared_pointers. There's not really any problem here. Nobody messes with the vector of unique_ptrs, it can be private in fact.

friedmud commented 8 years ago

You would never delete raw pointers. That's never been a requirement of the framework. When it was, people screwed it up.

My point is that the interface doesn't tell you anything about what's happening. You get a raw pointer back and are left wondering. Yes, it's a screwup to do a delete on one of those pointers... but nothing about the interface tells you that.

If it's returning a std::vector<std::shared_ptr<Kernel> > (like it is now) then you know exactly what's going on. You're getting access to something that has shared ownership and will be deleted automatically.

friedmud commented 8 years ago

Also: what happened to this?:

I'm not advocating handing back raw pointers when we have the smart pointer ourselves. That probably IS a bad idea.

And @jwpeterson said much the same thing:

I would never advocate handing back a raw pointer to something held as a unique_ptr by the library!

And now you're advocating for doing just that!

friedmud commented 8 years ago

Ok - I got a little carried away yesterday.

Let me start by apologizing to @roystgnr and @jwpeterson for bagging on libMesh. You guys know that I love libMesh (based my whole career around it!)... I didn't mean to be so harsh. Every library has a few rough edges... but it's not worth "HATING". Sorry about that.

Now... let's get back to the topic at hand.

@permcody you make a lot of good points. I hope that I've at least shown everyone that the choice is not straightforward.

I can see doing unique_ptr in the warehouses and passing around raw pointers based on that. Here's my stipulation: that the Warehouse doesn't ever expose any interface that returns a unique_ptr.

My biggest issue is the interface schism caused by unique_ptr: that some functions deal with unique_ptr and others deal with raw pointers. To "fix" that... we can just make the Warehouse take in unique_ptrs but NEVER give them back. From the outside everyone will always just use the raw pointers and be none-the-wiser that the objects are actually held in unique_ptrs.

I still kind of don't understand the difference between this and just holding raw pointers in the Warehouse and giving out raw pointers... but if it's the way you want to go I can't really see how it's going to be a problem... plus, we can always change it later if we don't like it.

jwpeterson commented 8 years ago

Let me start by apologizing to @roystgnr and @jwpeterson for bagging on libMesh.

No worries, this is what discussions are for.

aeslaughter commented 8 years ago

I would like to go on recored that I only "like libMesh as a friend, it not libMesh it is me."

On Wed, May 25, 2016 at 9:19 AM, John W. Peterson notifications@github.com wrote:

Let me start by apologizing to @roystgnr https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_roystgnr&d=BQMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=h7heP8xwI1i_HikChvhFbEBurKirgfOCdwgBxB9lM8c&m=GZrqZkSo81euUoVybaGzJ1gua79Go36GbB7foDJHnWw&s=43HMiN8cAxb6txBunJyWwjgv9KMFijqyyisKUcdIDYI&e= and @jwpeterson https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jwpeterson&d=BQMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=h7heP8xwI1i_HikChvhFbEBurKirgfOCdwgBxB9lM8c&m=GZrqZkSo81euUoVybaGzJ1gua79Go36GbB7foDJHnWw&s=BIVY2a61a3yo5-IxZ5_f92eQlw9WiasCHoHX28kdzfw&e= for bagging on libMesh.

No worries, this is what discussions are for.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly or view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_idaholab_moose_issues_7062-23issuecomment-2D221608725&d=BQMCaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=h7heP8xwI1i_HikChvhFbEBurKirgfOCdwgBxB9lM8c&m=GZrqZkSo81euUoVybaGzJ1gua79Go36GbB7foDJHnWw&s=GXvOm2eK_XMYZ171Rf_Nz8WfX0wtutx7LGQGEj6VLH4&e=

permcody commented 8 years ago

I have a difficult time arguing my position since I know that "premature optimization is bad" but there will come a time when it's worth looking at. I just wrote up this quick example to illustrate the cost of shared pointer copy (and thus, atomics) on my Mac (note that there is no threading or anything else to make this worse). Also note that our Intel processors kick ass at this kind of thing. It would be interesting to run this benchmark on BlueGene or some other system where atomics might be different.

#include <iostream>
#include <chrono>
#include <thread>

int f(std::shared_ptr<int> foo)
{
  return *foo;
}

int g(int * foo)
{
  return *foo;
}

int main()
{
  {
    // Shared Pointer case
    auto sp = std::make_shared<int>(1);
    auto t1 = std::chrono::high_resolution_clock::now();

    auto foo = 0u;
    for (auto i = 0u; i < 10000000; ++i)
      foo += f(sp);
    std::cout << foo << '\n';

    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "f() took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";
  }

  {
    // Unique Pointer with raw case
    auto up = std::unique_ptr<int>(new int(1));
    auto t1 = std::chrono::high_resolution_clock::now();

    auto foo = 0u;
    for (auto i = 0u; i < 10000000; ++i)
      foo += g(up.get());
    std::cout << foo << '\n';

    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "f() took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";
  }
}

Results:

10000000
shared_pointer_copy() took 231 milliseconds
10000000
raw_pointer_copy() took 44 milliseconds

This is a consistent benchmark on my system. It varies by one or two ms each time I run it.

roystgnr commented 8 years ago

Harsh is good, not bad. Identifying flaws is the first step toward fixing them. Alternatively, misidentifying flaws is the first step toward being corrected. ;-)

Back to the topic, though - unique_ptr is not an optimization. If your compiler is good and you don't use a custom deleter then it's exactly as efficient as a raw pointer; if not then it's worse. Using unique_ptr internally is just a question of whether you find unique_ptr semantics more or less confusing than trying to get the deletes right by hand.

However, if you hold unique_ptr or raw pointers, and you really want to hold the pointer, then you have to hand back raw pointers or raw references. Either one has the semantics of "I'm holding this, but you can use it until I go away", although IMHO that's much clearer in the reference case. Handing back any kind of smart pointer which got created from a raw or unique pointer is going to both semantically and functionally mean "this gets deleted when you're done with it; I'm done with it already". That's unacceptable in the warehouse context, right? If neither of those two is acceptable, the third alternative is "I'm holding this for now, but you can use it even after I go away" (i.e. hold shared_ptr and return shared_ptr).

roystgnr commented 8 years ago

Oh, and to justify my "corrected" wisecrack: "This gets deleted when you're done with it; I'm done with it already" is usually exactly what you want in the factory context, which is why libMesh is correct to use UniquePtr there. ;-)

The only caveat is that that design can backfire by requiring an API change if you later decide that your factory should be returning pointers-to-const so it can cache the stuff it builds; you can't exactly cache something if you've already made the user destroy it.

permcody commented 8 years ago

So basically C++ continues to give us more and better tools for blowing your own feet off and your users' feet as well. I love it!

friedmud commented 8 years ago

@permcody of course you can make it look slow... my point is... where is this code in MOOSE? Where do we do that sort of thing?

Even this extremely targeted test code is only 8x slower. To me, this is a win for shared_ptr.

BTW: On my laptop I get:

10000000
f() took 193 milliseconds
10000000
f() took 32 milliseconds

So it's closer to 6x.

And if you change the shared_ptr to being pass by reference (which is a valid thing to do if you have a single function like this that knows it's not going to be holding that pointer) I get this:

10000000
f() took 29 milliseconds
10000000
f() took 33 milliseconds

Now: I'm not advocating passing these by reference... the style guide should say "prefer to pass shared_ptr by copy"... but if there really is a tight loop like this where optimization is needed then you can break the style just slightly to achieve your goals.

Honestly, @permcody , this test code makes me even more sure that shared_ptr is the right idea...

friedmud commented 8 years ago

Ok - threading is definitely a thing to worry about here. I modified your program to use OpenMP (and made it sum 100,000,000 numbers for better timing). Here is the modified program:

#include <iostream>
#include <chrono>
#include <thread>

int f(std::shared_ptr<int> foo)
{
  return *foo;
}

int g(int * foo)
{
  return *foo;
}

int main()
{
  {
    // Shared Pointer case
    auto sp = std::make_shared<int>(1);
    auto t1 = std::chrono::high_resolution_clock::now();

    auto foo = 0u;

    #pragma omp parallel for reduction(+:foo)
    for (auto i = 0u; i < 100000000; ++i)
      foo += f(sp);
    std::cout << foo << '\n';

    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "f() took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";
  }

  {
    // Unique Pointer with raw case
    auto up = std::unique_ptr<int>(new int(1));
    auto t1 = std::chrono::high_resolution_clock::now();

    auto foo = 0u;

    #pragma omp parallel for reduction(+:foo)
    for (auto i = 0u; i < 100000000; ++i)
      foo += g(up.get());
    std::cout << foo << '\n';

    auto t2 = std::chrono::high_resolution_clock::now();
    std::cout << "f() took "
              << std::chrono::duration_cast<std::chrono::milliseconds>(t2-t1).count()
              << " milliseconds\n";
  }
}

Compile it with:

clang++ -std=c++11 -fopenmp shared_unique.C

In serial we see about 5x difference:

[gastdr][~/projects]> OMP_NUM_THREADS=1 ./a.out 
100000000
f() took 1986 milliseconds
100000000
f() took 390 milliseconds

With 4 processors though:

[gastdr][~/projects]> OMP_NUM_THREADS=4 ./a.out 
100000000
f() took 6838 milliseconds
100000000
f() took 110 milliseconds

70x difference. That hurts ;-)

However... changing to pass by reference:

[gastdr][~/projects]> OMP_NUM_THREADS=4 ./a.out 
100000000
f() took 116 milliseconds
100000000
f() took 120 milliseconds

So, things are definitely worse in threaded mode... but, we would notice the issue with copying the shared_ptr in this tight loop and we would be able to fix it... i.e. we would optimize when we need to...

dschwen commented 8 years ago

TBH I absolutely do not understand why we should discourage anyone from passing a shared pointer by reference. If a function makes a copy of the shared pointer the reference counting will be triggered, if it doesn't the parameter goes out of scope immediately, making the reference counting pointless. To me passing by reference looks like the right thing.

friedmud commented 8 years ago

@dschwen I suppose I agree. We could just make it policy that you pass shared_ptr by reference... but hold it "by copy". Any function that is just receiving a pointer to use for a moment should be able to get it by reference.

BUT: my big thing is: WHERE do we do any of this in MOOSE? We're almost never passing around pointers like this... certainly not in anything doing any computation (i.e., tight loops). I think we need to identify some places in MOOSE where this behavior might impact us before we can make a decision...

friedmud commented 8 years ago

BTW: Optimization makes things even more tilted:

[gastdr][~/projects]> clang++ -std=c++11 -fopenmp -O3 shared_unique.C
[gastdr][~/projects]> OMP_NUM_THREADS=4 ./a.out 
100000000
f() took 6351 milliseconds
100000000
f() took 13 milliseconds

600x difference!

With pass by reference:

[gastdr][~/projects]> OMP_NUM_THREADS=4 ./a.out 
100000000
f() took 11 milliseconds
100000000
f() took 11 milliseconds
friedmud commented 8 years ago

Forgot to include the serial case:

[gastdr][~/projects]> clang++ -std=c++11 -fopenmp -O3 shared_unique.C
[gastdr][~/projects]> OMP_NUM_THREADS=1 ./a.out 
100000000
f() took 1445 milliseconds
100000000
f() took 26 milliseconds
permcody commented 8 years ago

Funny. If you are going to pass a smart pointer by reference. You can do the same thing with a unique pointer.... Just sayin' :) On Wed, May 25, 2016 at 11:19 AM Derek Gaston notifications@github.com wrote:

Forgot to include the serial case:

[gastdr][~/projects]> clang++ -std=c++11 -fopenmp -O3 shared_unique.C [gastdr][~/projects]> OMP_NUM_THREADS=1 ./a.out 100000000 f() took 1445 milliseconds 100000000 f() took 26 milliseconds

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/idaholab/moose/issues/7062#issuecomment-221642785

friedmud commented 8 years ago

Funny. If you are going to pass a smart pointer by reference. You can do the same thing with a unique pointer.... Just sayin' :)

No... because if that function needs to create an object or call a function that does need to copy the pointer then you've just invalidated the unique_ptr held in the warehouse.

Passing unique_ptrs by reference is VERY dangerous... you're likely to create dangling pointers that way.

friedmud commented 8 years ago

I agree with @roystgnr you should only pass a unique_ptr to something if you're wanting to hand off responsibility for it.

permcody commented 8 years ago

How are you more likely to create a dangling pointer with a reference to a unique pointer over a shared_pointer? If the real copy is destroyed in either case you are in trouble.

permcody commented 8 years ago

BTW - I realize that my test case is not representative of what happens in our library. It wasn't meant to. I was actually trying to quantify the cost of messing with the atomics. Yes I also know that passing by reference fixes the problem but I really hope that we aren't building library code where that becomes the norm. I think it would be really interesting to benchmark atomics on other processors.

We are copying shared pointers in all of our warehouses today, but that's on subdomain setup and other non-critical regions.

All I've been saying from the start is that we should consider all options and use the right tool for the job at hand. For users, shared pointers might make the most sense since they are hopefully the least error prone. However I also still think that raw pointers have valid uses. Take a look at our raw pointers in our MOOSE classes like _current_elem, and _current_node. I really don't see any reason to change those. As one would expect, those pointers don't own anything. They are just "indexes" into other data structures. What do you gain from making those kinds of pointers shared?

dschwen commented 8 years ago

On a Raspberry Pi (Model 2):

serial (no optimization):

10000000
f() took 5856 milliseconds
10000000
f() took 1818 milliseconds

with OpenMP (no optimization):

100000000
f() took 22025 milliseconds
100000000
f() took 4564 milliseconds

serial (with -O2, which is faster than -O3 here)

10000000
f() took 237 milliseconds
10000000
f() took 0 milliseconds

(dead code elimination - I'm using gcc-4.6)

OpenMP (with -O2):

100000000
f() took 25046 milliseconds
100000000
f() took 0 milliseconds
friedmud commented 8 years ago

How are you more likely to create a dangling pointer with a reference to a unique pointer over a shared_pointer? If the real copy is destroyed in either case you are in trouble.

The problem is that the callee making a copy invalidates the original...

For instance... this will segfault:

#include <iostream>

void afunc(std::unique_ptr<int> & ptr)
{
  auto a = std::move(ptr);
  std::cout<<*a<<std::endl;
}

int main()
{
  auto a  = std::unique_ptr<int>(new int(1));

  afunc(a);

  std::cout<<*a<<std::endl;

  return 0;
}

You might say: "That's stupid code, just don't do that!"... but in a large library it might not be so obvious. For instance, what if afunc() creates an object that takes in the unique_ptr and stores it internally? An unwitting developer might write that code (not knowing what's happening inside the object) and completely hose up the framework.

This is NOT academic... if you remember I actually did this to myself just a couple of months ago in libMesh: https://sourceforge.net/p/libmesh/mailman/libmesh-users/thread/CAFfxPjq62TEwzSZ_F8Pq1fSt2T%2B_8VC8mAMdDkOfwNkiHNytXQ%40mail.gmail.com/

Yes... it was slightly easier there because AutoPtr has operator= and unique_ptr doesn't... but an unwitting developer could still screw this up pretty easily.

friedmud commented 8 years ago

Another case where this could happen is that a developer does std::move with the intention of std::moveing it back before the function ends... but because of an early return or an exception being thrown it never happens...

friedmud commented 8 years ago

Yes I also know that passing by reference fixes the problem but I really hope that we aren't building library code where that becomes the norm.

Why? We always pass objects by reference... what's the difference here? I actually went and looked at my MOC code and it turns out that in all of my functions I pass shared_ptr by reference... but in all of my objects I store shared_ptr by copy. Works perfectly.

We are copying shared pointers in all of our warehouses today, but that's on subdomain setup and other non-critical regions.

Yes... there's no way that's going to show up in timing

Take a look at our raw pointers in our MOOSE classes like _current_elem, and _current_node. I really don't see any reason to change those. As one would expect, those pointers don't own anything. They are just "indexes" into other data structures. What do you gain from making those kinds of pointers shared?

Those pointers come from libMesh... we wouldn't change them (why would we?). No one is advocating that.

I'm talking about places in MOOSE where we do new... those should be make_shared()

permcody commented 8 years ago

😆 That actually is stupid code. Yeah the '=' overload is less obvious but if you actually do a standard move and then wonder why the original isn't there anymore... Well...

Since we are getting into ridiculous code mode about about this case?

{
  ...
  std::shared_pointer<Foo> sp = obj.getFoo();

  Junk junk(sp);

  _vec.push_back(junk);
}

// Class Junk
class Junk
{
  Junk(std::shared_pointer<Foo> & foo) :
    _foo(foo)
  {}

  std::shared_pointer<Foo> & _foo;
};

Before you say this is stupid code, just don't do that.... Well this is stupid code, nevermind.

Nazis suck

permcody commented 8 years ago

@jwpeterson pointed out that my first implementation was fine. I've adjusted it so that "junk" lives on!