Closed AlexGuteniev closed 1 month ago
Thanks, this is great! :heart_eyes_cat: I pushed further changes to centralize the logic, please meow if you have concerns.
This centralization is already done in #4659 , there would just be more conflicts, after doing again here
Oh, you also did the implementation of _Get_any
via _Get_v_pos
. Didn't think about it, bu seems it will be no worse, as 0 is expected to constant propagate.
I'm drowning in PRs right now, so I think I'm going to have to clear out the backlog in multiple batches (in between investigating a non-STL bug that I can't weasel out of investigating forever). I'd like to land this PR first, then resolve conflicts in #4659.
I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.
Thanks for noticing how to make this code way more elegant! :magic_wand: :heart_eyes_cat: :green_heart:
This: piece https://github.com/microsoft/STL/blob/8dc4faadafb52e3e0a627e046b41258032d9bc6a/stl/src/vector_algorithms.cpp#L1116-L1123 works around the oddity of not having
_mm_cvtsi128_si64
on 32-bit x86It has been problematic:
I have discovered a nicer workaround!
If we spill the reg into the stack, the spill will optimize away. On 32-bit with at least
/arch:SSE2
it even produces better code than the existing workaround. Demo: https://godbolt.org/z/ErGWz8GYTIt still does the actual spill on
/arch:IA32
. But given that this path is executed only once per function call (there are no intermediate reductions for 64-bit elements), and there's a plan to lift to/arch:SSE2
, I think that's fine.