recursive variant move construction/assignment complexity

lightmare commented 7 years ago

out/recursive_wrapper_test 321000000
TYPE OF RESULT-> TYPE_ID=N4test9binary_opINS_3subEEE
1542000us
total=321000000
2+3-4=1
----- sum of 1000 ones -----
construction 57397us
calculation 7377us
total=1000000
----- sum of 5000 ones -----
construction 1485368us
calculation 256442us
total=25000000

out/unique_ptr_test 321000000
TYPE OF RESULT-> TYPE_ID=St10unique_ptrIN4test9binary_opINS0_3subEEESt14default_deleteIS3_EE
840123us
total=321000000
2+3-4=1
----- sum of 1000 ones -----
construction 41us
calculation 5986us
total=1000000
----- sum of 5000 ones -----
construction 171us
calculation 182640us
total=25000000

Look at the times for "construction" of large trees. With recursive_wrapper a 5x increase in tree size results in 1485368 / 57397 > 25x increase in running time. With unique_ptr the increase in running time is only 171 / 41 > 4x.

There's one strange surprise, though. The first timing below TYPE is from an existing calculation test, and unique_ptr is almost twice as fast after the change (which doesn't touch any of that code), whereas before the change, there was no difference:

out/recursive_wrapper_test 321000000
TYPE OF RESULT-> TYPE_ID=N4test9binary_opINS_3subEEE
844526us
total=321000000
2+3-4=1

out/unique_ptr_test 321000000
TYPE OF RESULT-> TYPE_ID=St10unique_ptrIN4test9binary_opINS0_3subEEESt14default_deleteIS3_EE
844189us
total=321000000
2+3-4=1

That's with gcc-6.2. I previously tried with 4.8 and 5.4 and there were differences but not really convincing; recursive_wrapper was faster, then I slightly changed the code to only construct one test::calculator instead of constructing temporaries for every operation, and suddenly unique_ptr was faster; then I reverted that change, added the bench_large_tree function (without calling it) and the times were almost identical.

lightmare commented 7 years ago

I decided to turn this PR into an actual fix for the issue demonstrated by the added benchmark.

Rather than going the "make recursive_wrapper a unique_ptr with copy" route, I only did a small change -- make recursive_wrapper's move constructor act like unique_ptr's, and in variant_helper::move destroy the moved-from value afterwards (regardless of whether it's a recursive_wrapper or not).

This would be a set-back for other variant implementations that optimize same-type assignment, but not for this one. mapbox::variant's assignment unconditionally destroys the active alternative first, so the worst-case slowdown this change incurs is equivalent to helper_type::destroy(detail::invalid_index, nullptr);.

cbeck88 commented 7 years ago

Hmm, just a thought: you might want to rename the "variant_helper::move" function to variant_helper::destructive_move, since that's what it does now? Similar to this proposal: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4158.pdf

Or variant_helper::move_and_destroy?

cbeck88 commented 7 years ago

Another thought: Your patch is a lot smaller and simpler than I expected :D But one of the reasons for this is that you change the behavior slightly in all cases, not only when recursive_wrapper occurs.

For instance, suppose I have some code

variant<int, std::string> a, b, c;
c = std::string{"foo"};
b = std::move(c);
a = std::move(b);

c = std::string{"bar"};
b = std::move(c);
a = std::move(b);

In current code, the last three lines of code will all end up invoking move assignment operator of std::string. After this patch, it's sometimes the move constructor instead, because after c and b are moved from their things get destroyed.

The reason C++ has copy assignment operators and move assignment operators is that it may in some cases be more efficient than simply destroying and using a constructor. The object may need to acquire resources, or populate a small lookup table or something.

In most cases though it's not a big difference. Also, your way of doing it is certainly valid under the usual definition of move semantics, the programmer can only assume a "valid state" after the move.

I wonder if there's any non-artificial examples that would be slowed down by this.

Potentially, you could try to make it so that it only does the "move and destroy" logic when the type is a recursive_wrapper. That means that if move assignment of the user type is faster than destroy and move construct, they get to retain that optimization opportunity in more cases. But it would also be a little more complicated to program.

lightmare commented 7 years ago

In current code, the last three lines of code ...

... destroy the target first, then move-construct a new string. https://github.com/mapbox/variant/blob/d2588a8f1d6b5d480d228e6d8a906ce634bdea9a/include/mapbox/variant.hpp#L629

After this PR, they receive an already-destroyed target, so the call to helper_type::destroy (that just recurses through all alternatives doing nothing) is newly introduced overhead. Which also kinda answers your question about non-artificial examples. This affects every assignment to a moved-from variant, e.g. std::swap does that twice, so it's not uncommon.

Potentially, you could try to make it so that it only does the "move and destroy" logic when the type is a recursive_wrapper. That means that if move assignment of the user type is faster than destroy and move construct, they get to retain that optimization opportunity in more cases. But it would also be a little more complicated to program.

Yes, that's what I was aiming for, but then noticed that assignment unconditionally destroys the target first, so until that is changed (and I didn't want to extend the scope of this PR), it wouldn't buy much.

cbeck88 commented 7 years ago

Hmm right you are, that is a quirk of mapbox variant I guess

mapbox / variant

recursive variant move construction/assignment complexity #139