scalar implementation without code duplication

xtensor-stack / xsimd

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

https://xsimd.readthedocs.io/

BSD 3-Clause "New" or "Revised" License

2.19k stars 256 forks source link

scalar implementation without code duplication #998

Closed actual-daniel closed 7 months ago

actual-daniel commented 9 months ago

I'm currently a bit confused about the purpose of xsimd::generic. My expectation was that its possible to use this architecture as a fallback or for development purposes. For this I would assume that xsimd::batch<T, xsimd::generic> just contains a single scalar value. But this is not the case - Instead the build fails because xsimd::has_simd_register is false.

I could specialize load_* and broadcast calls and try to separate it from the rest of the implementation which then hopefully works with scalar and simd types. But it feels like this could be abstracted through implementation of the generic architecture.

Is there a reason why this is not done? Are there examples how to write simd and scalar implementations without duplicating code?

serge-sans-paille commented 9 months ago

yeah, generic means "generic vector architecture". It would work with an hypothetical vector of one register, but that architecture fallback doesn't exist.

We do provide overloads for scalar values though, i.e. xsimd::cos fallsback to std::cos where it makes sense.

actual-daniel commented 9 months ago

@serge-sans-paille Thanks for answering! Is there a reason why it is not implemented? I guess the implementation would not be that complicated - Is there a chance that this functionality is merged if contributed?

If merging upstream is not feasible I guess the best way to resolve this is to maintain a custom "scalar" backend which is derived from generic?

serge-sans-paille commented 9 months ago

Instead of the "scalar" backend, which would conflict in essence with the scalar overload we provide for every operation, we could have an emulated<n> backed that would emulate a batch of n elements using scalar operations, relying on the generic implementation etc.

Is that what you have in mind? I'd happily implement that.

actual-daniel commented 9 months ago

@serge-sans-paille Yes this is exactly what I was looking for! Using a generic n as you suggested would actually be even better than what my initial request was.