Open pbauman opened 10 years ago
How far down in the compute loops are such casts used? Needing to macro this up for performance has a design smell to it.
No idea, but wanted to get the thought down before I forgot about it. "Design smell" means "too much design" or "needs better design than macro" or other?
Feels odd to need to perform a dynamic cast in a critical compute loop. I'd expect the expensive cast to always be done safely outside the loop.
Admittedly they can be relatively expensive (just tested locally on GCC 4.4.6 using source from http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/):
OPERATION | MINTICK AVGTICK STDTICK | RELATIVE
----------------------------------------- --------------------------------------------
reinterpret_cast known-type: | 7017200 7022472 7468 | 1.00
virtual function + reinterpret_cast: | 11023060 11077007 129030 | 1.57
member variable + reinterpret_cast: | 8021136 8045386 47212 | 1.14
dynamic_cast same-type-base success | 7008016 7046708 49169 | 1.00
dynamic_cast same-type-level1 success | 36055128 36457611 732101 | 5.19
dynamic_cast same-type-level2 success | 38296548 38307029 10689 | 5.45
dynamic_cast same-type-level3 success | 36052380 36100992 113992 | 5.14
dynamic_cast level1-to-base success | 6015188 6018479 3274 | 0.85
dynamic_cast level2-to-base success | 6008468 6014180 4758 | 0.85
dynamic_cast level3-to-base success | 6014668 6039942 56148 | 0.86
dynamic_cast level2-to-level1 success | 76261584 76997832 1035831 | 10.96
dynamic_cast level3-to-level1 success | 113149636 113461864 488122 | 16.15
dynamic_cast level3-to-level2 success | 76140680 77068236 1543087 | 10.97
dynamic_cast onebase-to-twobase fail | 28047192 28108865 142782 | 4.00
dynamic_cast onelevel1-to-twobase fail | 73099688 73127042 47443 | 10.41
dynamic_cast onelevel2-to-twobase fail | 179229428 179483756 596535 | 25.55
dynamic_cast onelevel3-to-twobase fail | 282686416 283363397 870670 | 40.35
Aight, grain of salt on that benchmark from having read some more critique of it. Comments about why the dynamic cast in the critical section still hold.
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/):
That source stuffs everything into main(); thanks to the lack of exported functions, the lack of function calls with possible side effects in those loops, and the lack of multiple inheritance, I wouldn't be too surprised if it was legal for the C++ compiler to optimize away much of the work in the heavyweight casts there. 10x and 20x slowdown may be a lower limit.
Agreed @roystgnr. Sorry for the noise. Still, rather than macroing around things, I'll soapbox one last time that it feels weird if that sort of macro produces an appreciable change. If you need zero-cost polymorphism down in the guts, then CRTP. But stomaching what that means to a design is unpleasant and I'd still just recommend casting higher up in the loops.
Agreed on the design red flag. For the record, the motivation behind libmesh_cast was not "we have a bunch of dynamic_casts in inner loops", it was "we want to turn every dynamic_cast failure into a libmesh_error() exception so it's easier to debug". And then once we had to wrap our casts in inline functions anyway, and we had a static_cast fallback path for !defined(LIBMESH_HAVE_RTTI) anyway, it seemed sensible enough to use that same path in opt mode.
Ah. That no-RTTI mode makes sense. Soapboxing withdrawn.
After seeing #212 reported, we should remove all direct calls to static_cast and dynamic_cast and wrap them up in a macro so that dynamic_cast is done in debug mode (to catch bad casts) and static_cast otherwise (for performance). This is what libMesh and GRINS does and has worked very well.