Closed mbrucher closed 7 years ago
Regarding your first observation, I'm not fully sure I understood it correctly. If I misunderstood and my answers don't actually explain the situation, could you clarify your question?
The reason why each supported architecture is supplied as a separate flag is that platforms may be combined one with another. For example, -DSIMDPP_DISPATCH_ARCH1=SIMDPP_ARCH_X86_AVX2,SIMDPP_ARCH_X86_FMA3
is a valid platform which is distinct from both AVX2 and FMA3. I haven't figured out a maintainable way to supply this information as a single flag. I'm open to suggestions of how this could be improved.
Unfortunately, there's no way to avoid this information being supplied. At least one compilation unit needs to be aware of all architectures that will be dispatched, because there's no cross platform way to statically register them. If we try to rely on constructors of static objects doing this work, then there will be no hard link-time dependency on the compile units the constructors sit in and the linker will strip them out on lots of platforms.
Regarding the template functions, are the required template instantiations known beforehand? That is, if you wrote your template code without using libsimdpp, could you put the definition of the templates into separate .cpp files and force the instantiation explicitly? If yes, then it's possible in theory to support your use case, I just need some time to think about how it could work. If not, unfortunately it's not possible to use libsimdpp dispatcher for this task, because libsimdpp relies on code for different architectures to be put in separate compile units. The compiler needs to know what code to generate for the templates and the only way to do so in such situation is instantiating explicitly in one way or another. All SIMD libraries will have this problem, because only some compilers support generation of code for different instruction sets in the same compile unit.
I do instantiate everything I need beforehand (https://github.com/mbrucher/AudioTK/blob/feature/SIMDBasicFilters/ATK/Core/SIMD/QuaternionConvertFilter.cpp#L92). I already have the functions creating the objects, but how to do the rest...
I wonder if with Boost tuples, we could manage to generate this properly... Does the compilable architecture function retrieves all the combinations, or is it something else that has them?
I do agree that there is an issue for runtime discovery with static libraries, but they should get stripped on shared libraries, shouldn't they? I'm wondering if 2 options, static and runtime dispatches, could solve the problem for complex classes like the one I have. It's also problematic that the CMake files don't seem to support the dispatcher (yet). But once we have a final solution, I can help with the creation of the required functions, no problem. I could write the dispatcher as std::unique_ptr<> something_float_simd, but it seems a little bit off, as the rest of the library uses template arguments. I could do with if there is really no other way though.
Dispatching via shared libraries is actually quite easy -- compile several libraries with different SIMDPPARCH* flags and then just select correct library according to the cpuid results and dlsym symbols from it.
I think I've figured a way to support template functions in the dispatcher. I'll have some code in a custom branch after several days.
Regarding
CMake files don't seem to support the dispatcher
Did you look into simdpp_multiarch? Is it adequate for your use case?
Obviously I haven't looked enough! So there is one variable DISPATCHER_FILE for indicating the dispatcher file. What does this mean? Do I still need the full DISPATCH macro in all files, and I set DISPATCHER_FILER to one of them? How do O chose it as I don't know there full names? Should probably be a list of files actually, instead of just one, as each argument file could require a dispatch file. Glad to here about your potential solution, that would be great!
OK, the DISPATCHER part is done automatically, I should read the cmake file better! I think I start understanding this :)
Did you start working on the implementation? Or if you have an idea, I can participate in the implementation. I will definitely try it as soon as you have a first version ;)
Sorry for a long delay in responding. Yes, I did start working on the implementation and it's almost done. It turned out to be more complex than I anticipated, so I decided not to push anything unfinished, as it would very likely waste your time. It's hard to predict when I have something for you to use, but it's likely that during the weekend I'll have more time to wrap things up.
@mbrucher I've pushed an experimental implementation on the dispatch-templates branch. Some docs are in the source an several examples can be seen in tests. It would be great to hear any issues, suggestions or ideas that you might have once you start using the new functionality.
I've only tested the implementation on GCC, so the implementation could still contain bugs. More testing results will be available soon.
Excellent, thanks a lot! I'll try it with clang and VS 2017 as soon as I have time playing with it (might take some days, I'm stuck with another project to finish first!).
Currently trying to implement this for the following function:
template std::unique_ptr
Thanks for feedback. It looks like I made an error when writing documentation and the implementation indeed does not work with non-deducible arguments. I've merged your PR with one modification - the new macro parameter was moved to the second position in the argument list (see 7ea917d67448fcf6937f05a1ad210f6000c5a9da). Having template-related arguments together reduces the chance of user errors which in this case result in quite nasty compiler output.
Excellent, thanks a lot! And really great work with the library. It's really great to have a way of not writing intrinsics and still support all these different platforms ;) I like portable code.
FYI, I saw errors when compiling dispatch tests on MSVC on the dispatch-templates branch, so I think we can keep the issue open for the time being :-)
OK, I'll have a look as well.
Seems to be building on VS2017 for the template dispatchers I use.
I noticed something strange. On AVX, if I do a vector<>.assign() with the zero value in some conditions, it triggers a segmentation fault. At least, that's what happens on Linux, not on macOS or VS (that I noticed). Is there something special about AVX registers compared to the others?
Could you give more information about the circumstances of the crash? Does this happen in dispatched code? What instruction is being executed when the crash happens?
Seems like it might be an alignment issue in my code (error signal: SIGSEGV, si_code: 0), so I'm investigating this lead more.
OK, was my problem with a bad understanding of std::align...
I think I managed to reproduce the compilation issue on Windows, I'll have a look at it.
Seems like all the sequence stuff doesn't work with VS2017, but I don't know much about preprocessor :/ It's just as if it would process the macro SIMDPP_DETAIL_EXTRACT_PARENS_IGNORE_REST keeps the rest of the call instead of ditching it. So even the simple instantiation fails :/
Seems like all the sequence stuff doesn't work with VS2017, but I don't know much about preprocessor :/ It's just as if it would process the macro SIMDPP_DETAIL_EXTRACT_PARENS_IGNORE_REST keeps the rest of the call instead of ditching it. So even the simple instantiation fails :/
I think I've fixed the issues on MSVC in 2822508bd1fb6459d and merged the branch to master. Could you please check whether the code on master branch fixes the problem in your code base?
Compiles properly, thx a lot!
I'm trying to figure out how to generate the macros when using the dispatcher. First, one need to add -DEMIT_DISPATCHER and the list of supported platforms. But it's not really great for cross platform automated build . Then, my functions are templated factory builders, and this doesn't work...
I was wondering if using the get compilable macro with cmake could generate a comma separated list (easy to do with CMake) and then use it with Boost preprocessor macros to generate the proper code in a template acceptable way. Any thoughts on this? Without this 2 features, the SIMD filters I'm trying to build for my library are just unusable :/