google / mathfu

C++ math library developed primarily for games focused on simplicity and efficiency.
http://google.github.io/mathfu
Apache License 2.0
1.4k stars 188 forks source link

How about let `xy()` and `xyz()` functions return references instead of values. #11

Closed FrankStain closed 8 years ago

FrankStain commented 8 years ago

I had noticed that mathfu::Vector<T, 3> has homogeneous memory representation with mathfu::Vector<T, 3>. It means that mathfu::vec3 can be safely casted to mathfu::vec2 via reinterpret_cast<mathfu::vec2&>( vec3_inst ). And same situation i see for mathfu::vec4 and mathfu::vec3 classes.

So how about making mathfu::vec3::xy(), mathfu::vec4::xy() and mathfu::vec4::xyz() return references instead of values?

Smth like:

inline Vector<T, 2>& xy() { return *reinterpret_cast<Vector<T, 2>*>( this ); };
inline const Vector<T, 2>& xy() const { return *reinterpret_cast<const Vector<T, 2>*>( this ); };

Such conversion is much helpful!

VladSerhiienko commented 8 years ago

Hi, FrankStain! Did you try to run the code with the suggested changes in the Release configuration? I think you might face memory-access-related crashes on some platforms (on Windows x64 most likely too). Also, if you require that sort of functionality, you can try Eigen library. As I remember, it has heading() template function, that returns reference.

FrankStain commented 8 years ago

@VladSerhiienko , hey :)

It was long time away, so yep, i tried to make some tests around this issue. So... running on x64 platform with enabled SIMD processing of vectors we facing one major restriction: the memory of any vector must be aligned to 8 bytes. Actually, this restriction is common for all platforms where SIMD processing is enabled. Casting the memory from vec3 to vec2 we don't cause pointer deformation, but only declares that pointer must point to memory of less size. Also, all representations of mathfu::Vector<T, N> consists of same __m128 member and covers memory of same size.

So, just requesting the xy component from xyz or xyzw causes no problems. But situation changes if we need to access yz component by reference. Yep, here we will have a problems.

Did i missed something more important?

By means accessing of xy component by reference i see the performance optimization - excluding of allocation and construction of another vector. When read-only vector of less dimension must be passed as argument or used in calculation, such optimization can save some ticks to let you spend it on more usefull operations. :)

stewartmiles commented 8 years ago

Hi, regarding returning references rather than values for subcomponents, check out https://github.com/google/mathfu/blob/master/include/mathfu/vector_3.h#L323

Then take a look at the simd4f data type on each platform here: https://github.com/scoopr/vectorial/tree/ae7dc88215e14c812786dc7a1f214610ae8540f5/include/vectorial

It's pretty different, in some cases (e.g NEON) this uses a type like float32x4_t which requires the user to load and store the type using intrinsics http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491f/BABGABJH.html to avoid undefined behavior. Now I know in practice when the NEON value is then stored in memory we're probably going to be ok but it's very likely - if you're using a chain of operations - the compiler will try to keep the vector in a register in which case you're going to be out of luck with the proposed optimization.

In general, I really don't recommend using reinterpret_cast with mathfu data types for this type of reason, you may end up with code that works ok on your beefy x86_64 CPU but results in some really strange behavior elsewhere.

Cheers, Stewart

On Fri, May 20, 2016 at 5:23 AM, Franken notifications@github.com wrote:

@VladSerhiienko https://github.com/VladSerhiienko , hey :)

It was long time away, so yep, i tried to make some tests around this issue. So... running on x64 platform with enabled SIMD processing of vectors we facing one major restriction: the memory of any vector must be aligned to 8 bytes. Actually, this restriction is common for all platforms where SIMD processing is enabled. Casting the memory from vec3 to vec2 we don't cause pointer deformation, but only declares that pointer must point to memory of less size. Also, all representations of mathfu::Vector<T, N> consists of same __m128 member and covers memory of same size.

So, just requesting the xy component from xyz or xyzw causes no problems. But situation changes if we need to access yz component by reference. Yep, here we will have a problems.

Did i missed something more important?

By means accessing of xy component by reference i see the performance optimization - excluding of allocation and construction of another vector. When read-only vector of less dimension must be passed as argument or used in calculation, such optimization can save some ticks to let you spend it on more usefull operations. :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/google/mathfu/issues/11#issuecomment-220591670

FrankStain commented 8 years ago

@stewartmiles , hi!

Sadly, i can't clearly understand some of your words.

Did you mean that float32x4_t can't be safely reinterpretated to float32x2_t? Also i see the same union inside vec3 and vec4: https://github.com/google/mathfu/blob/master/include/mathfu/vector_2.h#L225 https://github.com/google/mathfu/blob/master/include/mathfu/vector_4.h#L37 It means the memory mapping is same for vec3 and vec4 on each of supported platforms.

Now I know in practice when the NEON value is then stored in memory we're probably going to be ok but it's very likely - if you're using a chain of operations - the compiler will try to keep the vector in a register in which case you're going to be out of luck with the proposed optimization.

Can you clarify this point widely?

stewartmiles commented 8 years ago

That's correct, float32x4_t can't safely be reintepreted as float32x2_t. vectorial does have a union that maps the types to float but this isn't actually used since the compiler is likely to keep the type in a NEON register and not write the data back out to memory so that it can be reinterpreted as a float. All operations on the simd* types need to be performed using compiler intrinsics.

The compiler intrinsics for NEON's simd2f are located in https://github.com/scoopr/vectorial/blob/ae7dc88215e14c812786dc7a1f214610ae8540f5/include/vectorial/simd2f_neon.h#L17 . See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0205g/BABGABJH.html for more information.

On Fri, May 20, 2016 at 9:26 AM, Franken notifications@github.com wrote:

@stewartmiles https://github.com/stewartmiles , hi!

Sadly, i can't clearly understand some of your words.

Did you mean that float32x4_t can't be safely reinterpretated to float32x2_t? Also i see the same union inside vec3 and vec4: https://github.com/google/mathfu/blob/master/include/mathfu/vector_2.h#L225 https://github.com/google/mathfu/blob/master/include/mathfu/vector_4.h#L37 It means the memory mapping is same for vec3 and vec4 on each of supported platforms.

Now I know in practice when the NEON value is then stored in memory we're probably going to be ok but it's very likely - if you're using a chain of operations - the compiler will try to keep the vector in a register in which case you're going to be out of luck with the proposed optimization. Can you clarify this point widely?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/google/mathfu/issues/11#issuecomment-220653035