roystgnr / MetaPhysicL

Metaprogramming and operator-overloaded classes for numerical simulations
Other
22 stars 12 forks source link

Don't default initialize NumberArray data #34

Closed lindsayad closed 5 years ago

lindsayad commented 5 years ago

I know we have #13, and I was the one to make the change for NumberArray in #11, but I do think we should allow intentional uninitialized data. I see a lot of bzero show up in profiling. I don't think that a user who calls NumberArray a; and NumberArray a = NumberArray() should expect different behavior. I re-read this discussion quite a few times this morning and had @jwpeterson look at it as well; I guess I'm just not familiar with the second idiom. We both look at the first and second expressions in that discussion and expect to observe the exact same behavior between the two.

roystgnr commented 5 years ago

The difference in idioms is part of the standard: https://en.cppreference.com/w/cpp/language/zero_initialization https://en.cppreference.com/w/cpp/language/value_initialization and makes into a lot of tutorials:

The way to value-initialize a named variable before C++11 was T object = T();, which value-initializes a temporary and then copy-initializes the object: most compilers optimize out the copy in this case.

int a; gives uninitialized data, but b = int() sets or initializes b to 0.

How much of a performance boost do you get by getting rid of the redundant zeroing?

Can you see any way to get the same behavior out of a user-defined type? I haven't yet (although Initialization in C++ is bonkers so there may just be something I haven't considered yet). I'd love to let users get rid of redundant initialization, I just don't see how to do so without creating nasty situations for generic algorithm writers where a standard T x = T(); idiom works fine for int and double but then breaks horribly for ClassDuckTypingInt and ClassDuckTypingDouble.

lindsayad commented 5 years ago

int a; gives uninitialized data, but b = int() sets or initializes b to 0.

Ok yea sure for built-ins I understand this. I guess my statement was that I don't assume T x = T() will value-initialize the members of T for user defined types (like NumberArray).

How much of a performance boost do you get by getting rid of the redundant zeroing?

I haven't quantified it at this point. I'll try to get back to you later today. It may not be a lot.

Can you see any way to get the same behavior out of a user-defined type?

No I don't 😞

lindsayad commented 5 years ago

So the difference seems to be non-trivial but perhaps not back-breaking. On a representative problem, I get a run-time of 71 seconds without zeroing of the NumberArray and 80 seconds with the zeroing. Top functions with zeroing:

screen shot 2018-12-12 at 5 21 13 pm

Top functions without zeroing:

screen shot 2018-12-12 at 5 22 06 pm

And an excerpt of the diff graph:

screen shot 2018-12-12 at 5 26 09 pm

roystgnr commented 5 years ago

Ok, I may still be confused, but: Can we just declare the constructor "= default" (or let it be implicitly defined as default)? From what I'm reading on StackOverflow and in explanations of the standard regarding default versus zero initialization, it looks like that will do what we want: zero initialize in idioms where a builtin would zero initialize, but leave uninitialized in idioms where a builtin would be left uninitialized.

lindsayad commented 5 years ago

= default does indeed appear to do what we want! Valgrind complains about an uninitialized value in the first case, but not in the second 😄

#include "metaphysicl/numberarray.h"                                                                                                                                                                              
#include <iostream>                                                                                                                                                                                               

using namespace MetaPhysicL;                                                                                                                                                                                      

int main()                                                                                                                                                                                                        
{                                                                                                                                                                                                                 
  NumberArray<2, double> n1;                                                                                                                                                                                      
  NumberArray<2, double> n2 = NumberArray<2, double>();                                                                                                                                                           
  std::cout << n1[0] << std::endl;                                                                                                                                                                                
  std::cout << n2[0] << std::endl;                                                                                                                                                                                
}                              

So we would want to give the default constructor for DualNumber the same treatment...?

lindsayad commented 5 years ago

In the ALE problem, the profiling looks great with the defaulted constructors

roystgnr commented 5 years ago

Fantastic! I know that was based on a close reading of the standard, but part of me still can't believe that worked.

So we would want to give the default constructor for DualNumber the same treatment...?

Definitely. And NumberVector... and I think that's it, and #13 doesn't suggest anything I forgot... I guess the compile-time sparse classes might have the same concern and same resolution, but don't worry about those for now.

How do we properly test this, though? It feels like the sort of thing that would be really easy to regress down the line. We can have a unit test that uses the T() syntax and verifies that the result is 0. That can give false test successes because sometimes uninitialized data really does get zeroed, but it's be quick and easy and better than nothing. But we can't test uninitialized data to verify that the result is not zero, because that's never guaranteed not to give false test failures! Maybe a script that tests if valgrind exists, then tries putting a test code through an uninitialized value through valgrind if so, then screams and dies if the uninitialized value doesn't get flagged by valgrind?

lindsayad commented 5 years ago

Maybe a script that tests if valgrind exists, then tries putting a test code through an uninitialized value through valgrind if so, then screams and dies if the uninitialized value doesn't get flagged by valgrind?

That seems like it would be the most robust test

roystgnr commented 5 years ago

Thanks!

lindsayad commented 5 years ago

@roystgnr can you bootstrap this? Or if you want to give me push rights, I could also do it.

roystgnr commented 5 years ago

Thanks; you're long overdue for push rights at this point.

roystgnr commented 5 years ago

I'd say make a "pr34_bootstrapped" branch this time, but if you think this merits a release (or if you just want to benefit from the release-oriented bootstrap+tag+etc script we have) then we could do that instead.