Closed FrankStain closed 8 years ago
Hi, FrankStain! Did you try to run the code with the suggested changes in the Release configuration? I think you might face memory-access-related crashes on some platforms (on Windows x64 most likely too). Also, if you require that sort of functionality, you can try Eigen library. As I remember, it has heading() template function, that returns reference.
@VladSerhiienko , hey :)
It was long time away, so yep, i tried to make some tests around this issue.
So... running on x64 platform with enabled SIMD processing of vectors we facing one major restriction: the memory of any vector must be aligned to 8 bytes. Actually, this restriction is common for all platforms where SIMD processing is enabled.
Casting the memory from vec3
to vec2
we don't cause pointer deformation, but only declares that pointer must point to memory of less size.
Also, all representations of mathfu::Vector<T, N>
consists of same __m128
member and covers memory of same size.
So, just requesting the xy
component from xyz
or xyzw
causes no problems. But situation changes if we need to access yz
component by reference. Yep, here we will have a problems.
Did i missed something more important?
By means accessing of xy
component by reference i see the performance optimization - excluding of allocation and construction of another vector. When read-only vector of less dimension must be passed as argument or used in calculation, such optimization can save some ticks to let you spend it on more usefull operations. :)
Hi, regarding returning references rather than values for subcomponents, check out https://github.com/google/mathfu/blob/master/include/mathfu/vector_3.h#L323
Then take a look at the simd4f data type on each platform here: https://github.com/scoopr/vectorial/tree/ae7dc88215e14c812786dc7a1f214610ae8540f5/include/vectorial
It's pretty different, in some cases (e.g NEON) this uses a type like float32x4_t which requires the user to load and store the type using intrinsics http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0491f/BABGABJH.html to avoid undefined behavior. Now I know in practice when the NEON value is then stored in memory we're probably going to be ok but it's very likely - if you're using a chain of operations - the compiler will try to keep the vector in a register in which case you're going to be out of luck with the proposed optimization.
In general, I really don't recommend using reinterpret_cast with mathfu data types for this type of reason, you may end up with code that works ok on your beefy x86_64 CPU but results in some really strange behavior elsewhere.
Cheers, Stewart
On Fri, May 20, 2016 at 5:23 AM, Franken notifications@github.com wrote:
@VladSerhiienko https://github.com/VladSerhiienko , hey :)
It was long time away, so yep, i tried to make some tests around this issue. So... running on x64 platform with enabled SIMD processing of vectors we facing one major restriction: the memory of any vector must be aligned to 8 bytes. Actually, this restriction is common for all platforms where SIMD processing is enabled. Casting the memory from vec3 to vec2 we don't cause pointer deformation, but only declares that pointer must point to memory of less size. Also, all representations of mathfu::Vector<T, N> consists of same __m128 member and covers memory of same size.
So, just requesting the xy component from xyz or xyzw causes no problems. But situation changes if we need to access yz component by reference. Yep, here we will have a problems.
Did i missed something more important?
By means accessing of xy component by reference i see the performance optimization - excluding of allocation and construction of another vector. When read-only vector of less dimension must be passed as argument or used in calculation, such optimization can save some ticks to let you spend it on more usefull operations. :)
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/google/mathfu/issues/11#issuecomment-220591670
@stewartmiles , hi!
Sadly, i can't clearly understand some of your words.
Did you mean that float32x4_t
can't be safely reinterpretated to float32x2_t
?
Also i see the same union
inside vec3 and vec4:
https://github.com/google/mathfu/blob/master/include/mathfu/vector_2.h#L225
https://github.com/google/mathfu/blob/master/include/mathfu/vector_4.h#L37
It means the memory mapping is same for vec3
and vec4
on each of supported platforms.
Now I know in practice when the NEON value is then stored in memory we're probably going to be ok but it's very likely - if you're using a chain of operations - the compiler will try to keep the vector in a register in which case you're going to be out of luck with the proposed optimization.
Can you clarify this point widely?
That's correct, float32x4_t can't safely be reintepreted as float32x2_t. vectorial does have a union that maps the types to float but this isn't actually used since the compiler is likely to keep the type in a NEON register and not write the data back out to memory so that it can be reinterpreted as a float. All operations on the simd* types need to be performed using compiler intrinsics.
The compiler intrinsics for NEON's simd2f are located in https://github.com/scoopr/vectorial/blob/ae7dc88215e14c812786dc7a1f214610ae8540f5/include/vectorial/simd2f_neon.h#L17 . See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0205g/BABGABJH.html for more information.
On Fri, May 20, 2016 at 9:26 AM, Franken notifications@github.com wrote:
@stewartmiles https://github.com/stewartmiles , hi!
Sadly, i can't clearly understand some of your words.
Did you mean that float32x4_t can't be safely reinterpretated to float32x2_t? Also i see the same union inside vec3 and vec4: https://github.com/google/mathfu/blob/master/include/mathfu/vector_2.h#L225 https://github.com/google/mathfu/blob/master/include/mathfu/vector_4.h#L37 It means the memory mapping is same for vec3 and vec4 on each of supported platforms.
Now I know in practice when the NEON value is then stored in memory we're probably going to be ok but it's very likely - if you're using a chain of operations - the compiler will try to keep the vector in a register in which case you're going to be out of luck with the proposed optimization. Can you clarify this point widely?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/google/mathfu/issues/11#issuecomment-220653035
I had noticed that
mathfu::Vector<T, 3>
has homogeneous memory representation withmathfu::Vector<T, 3>
. It means thatmathfu::vec3
can be safely casted tomathfu::vec2
viareinterpret_cast<mathfu::vec2&>( vec3_inst )
. And same situation i see formathfu::vec4
andmathfu::vec3
classes.So how about making
mathfu::vec3::xy()
,mathfu::vec4::xy()
andmathfu::vec4::xyz()
return references instead of values?Smth like:
Such conversion is much helpful!